[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

ASF GitHub Bot (Jira) Sun, 24 Sep 2023 09:25:40 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768413#comment-17768413
 ]


ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------

amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335212974


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/ConfigurationUtil.java:
##########
@@ -41,4 +49,18 @@ public static Class<?> getClassFromConfig(Configuration 
configuration, String co
     }
   }
 
+  public static Configuration createHadoopConfiguration(ParquetConfiguration 
conf) {
+    if (conf == null) {
+      return new Configuration();
+    }
+    if (conf instanceof HadoopParquetConfiguration) {
+      return ((HadoopParquetConfiguration) conf).getConfiguration();
+    }
+    Configuration configuration = new Configuration();

Review Comment:
   When using a HadoopParquetConfiguration, the user did not yet decouple from 
Hadoop as it is just a wrapper for Configuration. When the user wants to 
decouple from Hadoop, they can implement their own ParquetConfiguration, which 
does not rely on Hadoop's Configuration (or a simple implementation is added 
afterwards, this PR was getting a bit large for that already). There is still 
some code right now, mainly around the codecs which _needs_ a Hadoop 
Configuration. It is therefore important that while we're still removing these 
last references to Hadoop, we can get such an instance from a 
ParquetConfiguration, in order not to break anything.





> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
>                 Key: PARQUET-2347
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2347
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration 
> class, which is used throughout Parquet's reading and writing logic. If we 
> include our own interface for this, this could potentially allow users to use 
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream 
> projects, the constructors and methods using Hadoop's constructor should be 
> preserved for the time being, though I would favour deprecation in the near 
> future.
> This is part of an effort that has been [discussed on the dev mailing 
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

Reply via email to