QiuYucheng2003 opened a new issue, #15284:
URL: https://github.com/apache/iceberg/issues/15284

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I identified a ThreadLocal misuse in org.apache.iceberg.spark.CommitMetadata 
that causes a ThreadLocalMap capacity leak.
   
   In the withCommitProperties method, the COMMIT_PROPERTIES ThreadLocal is 
cleaned up in the finally block using .set(ImmutableMap.of()) instead of 
.remove().
   
   Location:
   // CommitMetadata.java
   } finally {
     // Current implementation: Clears value but keeps the Entry in 
ThreadLocalMap
     COMMIT_PROPERTIES.set(ImmutableMap.of()); 
   }
   
   Impact: By using .set() with an empty map, the Entry (Key) for this 
ThreadLocal remains in the current thread's ThreadLocalMap. This prevents the 
slot from being reclaimed by the garbage collector or the map's expunge 
mechanism. In long-running threads (e.g., Spark Executors), this leads to 
capacity leaks, where the ThreadLocalMap table size may grow unnecessarily or 
accumulate stale entries, increasing hash collisions.
   
   
   Steps to Reproduce:
   1. Execute CommitMetadata.withCommitProperties(...) in a long-running thread.
   
   2. After the method returns, inspect the current thread's ThreadLocalMap 
(e.g., using a debugger or reflection).
   
   3. Observation: The Entry for COMMIT_PROPERTIES is still present in the map 
(pointing to an empty Map), occupying a table slot.
   
   Expected behavior:
   The ThreadLocal should be completely removed from the map to prevent 
capacity leaks. The cleanup code should be:
   } finally {
     COMMIT_PROPERTIES.remove();
   }
   
   Actual behavior:
   The ThreadLocal entry persists in the ThreadLocalMap with an empty value 
after execution.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to