[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

GitBox Sun, 15 May 2022 15:12:27 -0700


shangxinli commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873234812



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -44,8 +45,15 @@ public class CodecFactory implements CompressionCodecFactory 
{
   protected static final Map<String, CompressionCodec> CODEC_BY_NAME = 
Collections
       .synchronizedMap(new HashMap<String, CompressionCodec>());
 
-  private final Map<CompressionCodecName, BytesCompressor> compressors = new 
HashMap<CompressionCodecName, BytesCompressor>();
-  private final Map<CompressionCodecName, BytesDecompressor> decompressors = 
new HashMap<CompressionCodecName, BytesDecompressor>();
+  /*
+  See: https://issues.apache.org/jira/browse/PARQUET-2126
+  The old implementation stored a single global instance of each type of 
compressor and decompressor, which
+  broke thread safety. The solution here is to store one instance of each 
codec type per-thread.
+  Normally, one would use ThreadLocal<> here, but the release() method needs 
to iterate over all codecs
+  ever created, so we have to implement the per-thread management explicitly.
+   */
+  private final Map<Thread, Map<CompressionCodecName, BytesCompressor>> 
all_compressors = new ConcurrentHashMap<>();

Review Comment:
   In Java, we don't use '_'.  I think just call compressors should be fine 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

Reply via email to