shangxinli commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873234812
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -44,8 +45,15 @@ public class CodecFactory implements CompressionCodecFactory
{
protected static final Map<String, CompressionCodec> CODEC_BY_NAME =
Collections
.synchronizedMap(new HashMap<String, CompressionCodec>());
- private final Map<CompressionCodecName, BytesCompressor> compressors = new
HashMap<CompressionCodecName, BytesCompressor>();
- private final Map<CompressionCodecName, BytesDecompressor> decompressors =
new HashMap<CompressionCodecName, BytesDecompressor>();
+ /*
+ See: https://issues.apache.org/jira/browse/PARQUET-2126
+ The old implementation stored a single global instance of each type of
compressor and decompressor, which
+ broke thread safety. The solution here is to store one instance of each
codec type per-thread.
+ Normally, one would use ThreadLocal<> here, but the release() method needs
to iterate over all codecs
+ ever created, so we have to implement the per-thread management explicitly.
+ */
+ private final Map<Thread, Map<CompressionCodecName, BytesCompressor>>
all_compressors = new ConcurrentHashMap<>();
Review Comment:
In Java, we don't use '_'. I think just call compressors should be fine
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]