ggershinsky commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r984394753
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -244,16 +272,60 @@ protected CompressionCodec getCodec(CompressionCodecName
codecName) {
}
}
+ /**
+ * Modified for https://issues.apache.org/jira/browse/PARQUET-2126
+ * This releases all cached instances of all compressors and decompressors
created by all threads that share
+ * this CodeFactory instance.
+ * Note: A problem might occur if release() were called while some codec
instances were still in use, but it
+ * would not make sense to call close() or release() on a shared
CodecFactory while some threads are still
+ * actively using it. The usage pattern should be:
Review Comment:
It would be also good to have a look at an example of such callers, eg in
Apache Spark, to start analyzing the implications.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -244,16 +272,60 @@ protected CompressionCodec getCodec(CompressionCodecName
codecName) {
}
}
+ /**
+ * Modified for https://issues.apache.org/jira/browse/PARQUET-2126
+ * This releases all cached instances of all compressors and decompressors
created by all threads that share
+ * this CodeFactory instance.
+ * Note: A problem might occur if release() were called while some codec
instances were still in use, but it
+ * would not make sense to call close() or release() on a shared
CodecFactory while some threads are still
+ * actively using it. The usage pattern should be:
Review Comment:
regarding the code parts that call this release() method. For those parts
inside the parquet-mr codebase, can some of them implement/enforce this
pattern fully internally? I'd guess most/all of them eventually pass this
responsibility to the "app" code above parquet-mr API; so probably this pattern
documentation should be moved/copied/referenced in these APIs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]