linliu-code opened a new issue, #18440:
URL: https://github.com/apache/hudi/issues/18440
### Bug Description
**What happened:**
HiveIncrementalPuller.executeIncrementalSQL opens a Scanner on the
incremental SQL file without closing it (new Scanner(new
File(...)).useDelimiter("\\Z").next()), leaking the file
handle on every invocation. Additionally, SQL validation (checking that
the file references the correct source table and contains the
_hoodie_commit_time predicate) is embedded
inside executeIncrementalSQL, which is only called after a JDBC connection
has already been established. A misconfigured SQL file therefore causes the
connection to be opened and a
temp table drop to be issued before the error is caught. The method is
also private, making it impossible to unit-test the SQL rendering logic without
a live Hive server.
**What you expected:**
1. The Scanner should be closed after reading the SQL file
(try-with-resources).
2. SQL validation should happen eagerly — before any JDBC connection is
opened — so a bad config file is caught at the cheapest possible point.
3. executeIncrementalSQL should be testable in isolation (mocked
Statement) without requiring a live Hive connection.
**Steps to reproduce:**
1. Configure HiveIncrementalPuller with an incrementalSQLFile that
references the wrong source table (e.g., wrong sourceDb.sourceTable).
2. Call saveDelta().
3. Observe that a JDBC connection is opened and DROP TABLE IF EXISTS
<tempTable> is executed before the validation exception is thrown — and that
the Scanner opened on the SQL file
is never closed.
### Environment
**Hudi version:** master
**Query engine:** (Spark/Flink/Trino etc): hive
**Relevant configs:**
### Logs and Stack Trace
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]