QiuYucheng2003 opened a new issue, #17985:
URL: https://github.com/apache/hudi/issues/17985
### Bug Description
What happened:
In HoodieCombineHiveInputFormat.java, the getSplits method relies on Hive's
Utilities class which uses ThreadLocal to store MapWork.
While there is an explicit cleanup call Utilities.clearWorkMapForConf(job)
at the end of the method (Line 427), it is not wrapped in a finally block.
If an exception occurs during execution (e.g., inside
getNonCombinablePathIndices at Line 407, or getCombineSplits at Line 415), the
execution flow terminates immediately. Consequently, the cleanup line is
skipped, leaving stale split information in the ThreadLocal. In a pooled thread
environment, this causes Context Contamination for subsequent requests.
What you expected:
The Utilities.clearWorkMapForConf(job) call must be placed inside a finally
block to guarantee that ThreadLocal resources are cleaned up regardless of
whether the method completes successfully or throws an exception.
Steps to reproduce:
1. Review the code in HoodieCombineHiveInputFormat.java.
2. Locate the getSplits method (Lines 371-468).
3. Observe that Utilities.clearWorkMapForConf(job) is located at Line 427.
4. Trace the error handling: if getNonCombinablePathIndices throws an
exception, it is caught and rethrown as an IOException at Line 407.
5. Confirm that in this scenario, Line 427 is never reached, resulting in a
leak.
### Environment
Environment
Hudi version: main
Query engine: Hive (specifically HiveServer2 with Thread Pooling)
JDK 8
MacOS 13.0
### Logs and Stack Trace
Location: org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.java
// Current unsafe implementation (simplified):
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
{
init(job); // Initializes ThreadLocal data via Utilities
// ... extensive logic ...
// If this throws IOException, the method exits here
getNonCombinablePathIndices(job, paths, numThreads);
// CLEANUP IS HERE (Unsafe) - Skipped on exception
Utilities.clearWorkMapForConf(job);
return result.toArray(...);
}
// Recommended fix:
public InputSplit[] getSplits(...) {
try {
// ... all business logic ...
} finally {
Utilities.clearWorkMapForConf(job); // Guaranteed cleanup
}
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]