QiuYucheng2003 opened a new issue, #3398:
URL: https://github.com/apache/parquet-java/issues/3398
### Describe the bug, including details regarding any error messages,
version, and platform.
*(Note: This vulnerability was identified via Static Program Analysis
targeting ThreadLocal misuses in long-lived thread pool environments.)*
**Description:**
There is a severe ClassLoader leak risk (ClassLoader Pinning / Metaspace
OOM) in `org.apache.parquet.io.api.Binary$FromCharSequenceBinary`.
Around line 264, a `ThreadLocal` is initialized using a method reference
(Lambda):
```java
private static final ThreadLocal<CharsetEncoder> ENCODER =
ThreadLocal.withInitial(StandardCharsets.UTF_8::newEncoder);
The Root Cause:
1. The lambda StandardCharsets.UTF_8::newEncoder generates a dynamic
Supplier implementation at runtime. This generated class is loaded by the
application's current ClassLoader (e.g., Spark/Flink UserClassLoader or a Web
container's WebappClassLoader).
2. ThreadLocal.withInitial() returns a SuppliedThreadLocal, which holds a
strong reference to this lambda instance internally via its supplier field.
3. Binary is a low-level core API without any explicit lifecycle hooks,
meaning ENCODER.remove() is never called.
4. In big data computing engines or web containers, worker threads
(Executors/NIO threads) are pooled and long-lived. When a user job is cancelled
or an application is hot-redeployed, the worker threads survive. The retained
SuppliedThreadLocal keeps the lambda instance alive, which in turn permanently
pins the entire application ClassLoader, preventing garbage collection.
Impact:
Multiple job submissions, cancellations, or hot-redeployments in the same
JVM will cause linear Metaspace inflation, inevitably leading to a
java.lang.OutOfMemoryError: Metaspace.
Expected behavior:
Low-level data structures like Binary should not hold static references that
can unintentionally pin dynamic classloaders. The encoding logic should be
stateless to ensure safe class unloading.
Suggested Fix:
Remove the ThreadLocal cache entirely. The performance gain of caching
CharsetEncoder here is heavily outweighed by the ClassLoader leak risk. A
stateless conversion is much safer:
private static ByteBuffer encodeUTF8(CharSequence value) {
return
ByteBuffer.wrap(value.toString().getBytes(StandardCharsets.UTF_8));
}
### Component(s)
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]