[
https://issues.apache.org/jira/browse/PARQUET-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767974#comment-17767974
]
Atour Mousavi Gourabi commented on PARQUET-2353:
------------------------------------------------
Hi Fokko, as far as I'm aware [https://github.com/apache/parquet-mr/pull/1074]
allows for not directly instantiating a Hadoop-based CompressionCodecFactory
when reading, iff the user passes their own factory. Currently, however, we do
not have any unhadooped CompressionCodecFactory implementations AFAIK (both
CodecFactory and DirectCodecFactory will have to deal with a Hadoop
CompressionCodec at some point). For the specific codecs, CompressionCodecName
refers to 4 codecs from Hadoop itself, and 3 which are implemented in Parquet,
but still implement both the Configurable and CompressionCodec interfaces from
Hadoop. How I see it, this means the user would have to implement quite a bit
of this themselves, which is a pretty big ask. If nobody minds, I'd like to
work on this after [https://github.com/apache/parquet-mr/pull/1141] is taken
care of.
> Avoid Hadoop interfaces and classes in codecs
> ---------------------------------------------
>
> Key: PARQUET-2353
> URL: https://issues.apache.org/jira/browse/PARQUET-2353
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Atour Mousavi Gourabi
> Priority: Minor
>
> Currently the codecs implemented by Parquet implement the Hadoop Configurable
> and CompressionCodec interfaces. As part of the effort to decouple from
> Hadoop there need to be alternatives to these Hadoop implementations such
> that users are not forced to load Hadoop classes for this purpose at runtime.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)