QiuYucheng2003 opened a new issue, #3398:
URL: https://github.com/apache/parquet-java/issues/3398

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   *(Note: This vulnerability was identified via Static Program Analysis 
targeting ThreadLocal misuses in long-lived thread pool environments.)*
   
   **Description:**
   There is a severe ClassLoader leak risk (ClassLoader Pinning / Metaspace 
OOM) in `org.apache.parquet.io.api.Binary$FromCharSequenceBinary`.
   
   Around line 264, a `ThreadLocal` is initialized using a method reference 
(Lambda):
   ```java
   private static final ThreadLocal<CharsetEncoder> ENCODER =
       ThreadLocal.withInitial(StandardCharsets.UTF_8::newEncoder);
   
   
   The Root Cause:
   
   1. The lambda StandardCharsets.UTF_8::newEncoder generates a dynamic 
Supplier implementation at runtime. This generated class is loaded by the 
application's current ClassLoader (e.g., Spark/Flink UserClassLoader or a Web 
container's WebappClassLoader).
   
   2. ThreadLocal.withInitial() returns a SuppliedThreadLocal, which holds a 
strong reference to this lambda instance internally via its supplier field.
   
   3. Binary is a low-level core API without any explicit lifecycle hooks, 
meaning ENCODER.remove() is never called.
   
   4. In big data computing engines or web containers, worker threads 
(Executors/NIO threads) are pooled and long-lived. When a user job is cancelled 
or an application is hot-redeployed, the worker threads survive. The retained 
SuppliedThreadLocal keeps the lambda instance alive, which in turn permanently 
pins the entire application ClassLoader, preventing garbage collection.
   
   Impact:
   Multiple job submissions, cancellations, or hot-redeployments in the same 
JVM will cause linear Metaspace inflation, inevitably leading to a 
java.lang.OutOfMemoryError: Metaspace.
   
   Expected behavior:
   Low-level data structures like Binary should not hold static references that 
can unintentionally pin dynamic classloaders. The encoding logic should be 
stateless to ensure safe class unloading.
   
   Suggested Fix:
   Remove the ThreadLocal cache entirely. The performance gain of caching 
CharsetEncoder here is heavily outweighed by the ClassLoader leak risk. A 
stateless conversion is much safer:
   private static ByteBuffer encodeUTF8(CharSequence value) {
       return 
ByteBuffer.wrap(value.toString().getBytes(StandardCharsets.UTF_8));
   }
   
   ### Component(s)
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to