[ https://issues.apache.org/jira/browse/PARQUET-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767974#comment-17767974 ]
Atour Mousavi Gourabi commented on PARQUET-2353: ------------------------------------------------ Hi Fokko, as far as I'm aware [https://github.com/apache/parquet-mr/pull/1074] allows for not directly instantiating a Hadoop-based CompressionCodecFactory when reading, iff the user passes their own factory. Currently, however, we do not have any unhadooped CompressionCodecFactory implementations AFAIK (both CodecFactory and DirectCodecFactory will have to deal with a Hadoop CompressionCodec at some point). For the specific codecs, CompressionCodecName refers to 4 codecs from Hadoop itself, and 3 which are implemented in Parquet, but still implement both the Configurable and CompressionCodec interfaces from Hadoop. How I see it, this means the user would have to implement quite a bit of this themselves, which is a pretty big ask. If nobody minds, I'd like to work on this after [https://github.com/apache/parquet-mr/pull/1141] is taken care of. > Avoid Hadoop interfaces and classes in codecs > --------------------------------------------- > > Key: PARQUET-2353 > URL: https://issues.apache.org/jira/browse/PARQUET-2353 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Reporter: Atour Mousavi Gourabi > Priority: Minor > > Currently the codecs implemented by Parquet implement the Hadoop Configurable > and CompressionCodec interfaces. As part of the effort to decouple from > Hadoop there need to be alternatives to these Hadoop implementations such > that users are not forced to load Hadoop classes for this purpose at runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010)