QiuYucheng2003 opened a new issue, #15284:
URL: https://github.com/apache/iceberg/issues/15284
### Apache Iceberg version
main (development)
### Query engine
Spark
### Please describe the bug 🐞
I identified a ThreadLocal misuse in org.apache.iceberg.spark.CommitMetadata
that causes a ThreadLocalMap capacity leak.
In the withCommitProperties method, the COMMIT_PROPERTIES ThreadLocal is
cleaned up in the finally block using .set(ImmutableMap.of()) instead of
.remove().
Location:
// CommitMetadata.java
} finally {
// Current implementation: Clears value but keeps the Entry in
ThreadLocalMap
COMMIT_PROPERTIES.set(ImmutableMap.of());
}
Impact: By using .set() with an empty map, the Entry (Key) for this
ThreadLocal remains in the current thread's ThreadLocalMap. This prevents the
slot from being reclaimed by the garbage collector or the map's expunge
mechanism. In long-running threads (e.g., Spark Executors), this leads to
capacity leaks, where the ThreadLocalMap table size may grow unnecessarily or
accumulate stale entries, increasing hash collisions.
Steps to Reproduce:
1. Execute CommitMetadata.withCommitProperties(...) in a long-running thread.
2. After the method returns, inspect the current thread's ThreadLocalMap
(e.g., using a debugger or reflection).
3. Observation: The Entry for COMMIT_PROPERTIES is still present in the map
(pointing to an empty Map), occupying a table slot.
Expected behavior:
The ThreadLocal should be completely removed from the map to prevent
capacity leaks. The cleanup code should be:
} finally {
COMMIT_PROPERTIES.remove();
}
Actual behavior:
The ThreadLocal entry persists in the ThreadLocalMap with an empty value
after execution.
### Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]