[ 
https://issues.apache.org/jira/browse/PARQUET-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767974#comment-17767974
 ] 

Atour Mousavi Gourabi commented on PARQUET-2353:
------------------------------------------------

Hi Fokko, as far as I'm aware [https://github.com/apache/parquet-mr/pull/1074] 
allows for not directly instantiating a Hadoop-based CompressionCodecFactory 
when reading, iff the user passes their own factory. Currently, however, we do 
not have any unhadooped CompressionCodecFactory implementations AFAIK (both 
CodecFactory and DirectCodecFactory will have to deal with a Hadoop 
CompressionCodec at some point). For the specific codecs, CompressionCodecName 
refers to 4 codecs from Hadoop itself, and 3 which are implemented in Parquet, 
but still implement both the Configurable and CompressionCodec interfaces from 
Hadoop. How I see it, this means the user would have to implement quite a bit 
of this themselves, which is a pretty big ask. If nobody minds, I'd like to 
work on this after [https://github.com/apache/parquet-mr/pull/1141] is taken 
care of.

> Avoid Hadoop interfaces and classes in codecs
> ---------------------------------------------
>
>                 Key: PARQUET-2353
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2353
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Minor
>
> Currently the codecs implemented by Parquet implement the Hadoop Configurable 
> and CompressionCodec interfaces. As part of the effort to decouple from 
> Hadoop there need to be alternatives to these Hadoop implementations such 
> that users are not forced to load Hadoop classes for this purpose at runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to