[
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768413#comment-17768413
]
ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------
amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335212974
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/ConfigurationUtil.java:
##########
@@ -41,4 +49,18 @@ public static Class<?> getClassFromConfig(Configuration
configuration, String co
}
}
+ public static Configuration createHadoopConfiguration(ParquetConfiguration
conf) {
+ if (conf == null) {
+ return new Configuration();
+ }
+ if (conf instanceof HadoopParquetConfiguration) {
+ return ((HadoopParquetConfiguration) conf).getConfiguration();
+ }
+ Configuration configuration = new Configuration();
Review Comment:
When using a HadoopParquetConfiguration, the user did not yet decouple from
Hadoop as it is just a wrapper for Configuration. When the user wants to
decouple from Hadoop, they can implement their own ParquetConfiguration, which
does not rely on Hadoop's Configuration (or a simple implementation is added
afterwards, this PR was getting a bit large for that already). There is still
some code right now, mainly around the codecs which _needs_ a Hadoop
Configuration. It is therefore important that while we're still removing these
last references to Hadoop, we can get such an instance from a
ParquetConfiguration, in order not to break anything.
> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
> Key: PARQUET-2347
> URL: https://issues.apache.org/jira/browse/PARQUET-2347
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Atour Mousavi Gourabi
> Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration
> class, which is used throughout Parquet's reading and writing logic. If we
> include our own interface for this, this could potentially allow users to use
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream
> projects, the constructors and methods using Hadoop's constructor should be
> preserved for the time being, though I would favour deprecation in the near
> future.
> This is part of an effort that has been [discussed on the dev mailing
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)