QiuYucheng2003 opened a new issue, #17985:
URL: https://github.com/apache/hudi/issues/17985

   ### Bug Description
   
   What happened: 
   In HoodieCombineHiveInputFormat.java, the getSplits method relies on Hive's 
Utilities class which uses ThreadLocal to store MapWork.
   
   While there is an explicit cleanup call Utilities.clearWorkMapForConf(job) 
at the end of the method (Line 427), it is not wrapped in a finally block.
   
   If an exception occurs during execution (e.g., inside 
getNonCombinablePathIndices at Line 407, or getCombineSplits at Line 415), the 
execution flow terminates immediately. Consequently, the cleanup line is 
skipped, leaving stale split information in the ThreadLocal. In a pooled thread 
environment, this causes Context Contamination for subsequent requests.
   
   What you expected: 
   The Utilities.clearWorkMapForConf(job) call must be placed inside a finally 
block to guarantee that ThreadLocal resources are cleaned up regardless of 
whether the method completes successfully or throws an exception.
   
   Steps to reproduce:
   
   1. Review the code in HoodieCombineHiveInputFormat.java.
   
   2. Locate the getSplits method (Lines 371-468).
   
   3. Observe that Utilities.clearWorkMapForConf(job) is located at Line 427.
   
   4. Trace the error handling: if getNonCombinablePathIndices throws an 
exception, it is caught and rethrown as an IOException at Line 407.
   
   5. Confirm that in this scenario, Line 427 is never reached, resulting in a 
leak.
   
   ### Environment
   
   Environment
   Hudi version: main
   
   Query engine: Hive (specifically HiveServer2 with Thread Pooling)
   
   JDK 8
   
   MacOS 13.0
   
   ### Logs and Stack Trace
   
   Location: org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.java
   
   // Current unsafe implementation (simplified):
   public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException 
{
       init(job); // Initializes ThreadLocal data via Utilities
       
       // ... extensive logic ...
       
       // If this throws IOException, the method exits here
       getNonCombinablePathIndices(job, paths, numThreads); 
   
       // CLEANUP IS HERE (Unsafe) - Skipped on exception
       Utilities.clearWorkMapForConf(job); 
       
       return result.toArray(...);
   }
   
   // Recommended fix:
   public InputSplit[] getSplits(...) {
       try {
           // ... all business logic ...
       } finally {
           Utilities.clearWorkMapForConf(job); // Guaranteed cleanup
       }
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to