[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768413#comment-17768413 ]
ASF GitHub Bot commented on PARQUET-2347: ----------------------------------------- amousavigourabi commented on code in PR #1141: URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335212974 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/ConfigurationUtil.java: ########## @@ -41,4 +49,18 @@ public static Class<?> getClassFromConfig(Configuration configuration, String co } } + public static Configuration createHadoopConfiguration(ParquetConfiguration conf) { + if (conf == null) { + return new Configuration(); + } + if (conf instanceof HadoopParquetConfiguration) { + return ((HadoopParquetConfiguration) conf).getConfiguration(); + } + Configuration configuration = new Configuration(); Review Comment: When using a HadoopParquetConfiguration, the user did not yet decouple from Hadoop as it is just a wrapper for Configuration. When the user wants to decouple from Hadoop, they can implement their own ParquetConfiguration, which does not rely on Hadoop's Configuration (or a simple implementation is added afterwards, this PR was getting a bit large for that already). There is still some code right now, mainly around the codecs which _needs_ a Hadoop Configuration. It is therefore important that while we're still removing these last references to Hadoop, we can get such an instance from a ParquetConfiguration, in order not to break anything. > Add interface layer between Parquet and Hadoop Configuration > ------------------------------------------------------------ > > Key: PARQUET-2347 > URL: https://issues.apache.org/jira/browse/PARQUET-2347 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Reporter: Atour Mousavi Gourabi > Priority: Minor > > Parquet relies heavily on a few Hadoop classes, such as its Configuration > class, which is used throughout Parquet's reading and writing logic. If we > include our own interface for this, this could potentially allow users to use > Parquet's readers and writers without the Hadoop dependency later on. > In order to preserve backward compatibility and avoid breaking downstream > projects, the constructors and methods using Hadoop's constructor should be > preserved for the time being, though I would favour deprecation in the near > future. > This is part of an effort that has been [discussed on the dev mailing > list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8]. -- This message was sent by Atlassian Jira (v8.20.10#820010)