[
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768407#comment-17768407
]
ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------
amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335208155
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -246,9 +262,9 @@ protected CompressionCodec getCodec(CompressionCodecName
codecName) {
codecClass = Class.forName(codecClassName);
} catch (ClassNotFoundException e) {
// Try to load the class using the job classloader
- codecClass = configuration.getClassLoader().loadClass(codecClassName);
+ codecClass = new
Configuration(false).getClassLoader().loadClass(codecClassName);
}
- codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass,
configuration);
+ codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass,
ConfigurationUtil.createHadoopConfiguration(configuration));
Review Comment:
This PR removes _part_ of the Hadoop dependency. For at least the codecs we
still rely on them OOTB (see PARQUET-2353). AFAIK, using uncompressed codec
should not lead to the Hadoop classes being loaded (which would require the
runtime dependency).
> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
> Key: PARQUET-2347
> URL: https://issues.apache.org/jira/browse/PARQUET-2347
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Atour Mousavi Gourabi
> Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration
> class, which is used throughout Parquet's reading and writing logic. If we
> include our own interface for this, this could potentially allow users to use
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream
> projects, the constructors and methods using Hadoop's constructor should be
> preserved for the time being, though I would favour deprecation in the near
> future.
> This is part of an effort that has been [discussed on the dev mailing
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)