[I] [BUG] HiveIncrementalPuller leaks Scanner file handle and validates SQL after opening JDBC connection [hudi]

via GitHub Wed, 01 Apr 2026 11:45:30 -0700


linliu-code opened a new issue, #18440:
URL: https://github.com/apache/hudi/issues/18440


   ### Bug Description
   
   **What happened:**
     HiveIncrementalPuller.executeIncrementalSQL opens a Scanner on the 
incremental SQL file without closing it (new Scanner(new 
File(...)).useDelimiter("\\Z").next()), leaking the file
      handle on every invocation. Additionally, SQL validation (checking that 
the file references the correct source table and contains the 
_hoodie_commit_time predicate) is embedded
     inside executeIncrementalSQL, which is only called after a JDBC connection 
has already been established. A misconfigured SQL file therefore causes the 
connection to be opened and a
      temp table drop to be issued before the error is caught. The method is 
also private, making it impossible to unit-test the SQL rendering logic without 
a live Hive server.
   
   **What you expected:**
   
     1. The Scanner should be closed after reading the SQL file 
(try-with-resources).
     2. SQL validation should happen eagerly — before any JDBC connection is 
opened — so a bad config file is caught at the cheapest possible point.
     3. executeIncrementalSQL should be testable in isolation (mocked 
Statement) without requiring a live Hive connection.
   
   **Steps to reproduce:**
     1. Configure HiveIncrementalPuller with an incrementalSQLFile that 
references the wrong source table (e.g., wrong sourceDb.sourceTable).
     2. Call saveDelta().
     3. Observe that a JDBC connection is opened and DROP TABLE IF EXISTS 
<tempTable> is executed before the validation exception is thrown — and that 
the Scanner opened on the SQL file
      is never closed.
   
   
   ### Environment
   
   **Hudi version:** master
   **Query engine:** (Spark/Flink/Trino etc): hive
   **Relevant configs:** 
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [BUG] HiveIncrementalPuller leaks Scanner file handle and validates SQL after opening JDBC connection [hudi]

Reply via email to