[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

ASF GitHub Bot (Jira) Sun, 24 Sep 2023 09:37:56 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768418#comment-17768418
 ]


ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------

amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335214709


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java:
##########
@@ -167,13 +169,13 @@ public float getProgress() throws IOException, 
InterruptedException {
 
   public void initialize(ParquetFileReader reader, ParquetReadOptions options) 
{
     // copy custom configuration to the Configuration passed to the ReadSupport
-    Configuration conf = new Configuration();
-    if (options instanceof HadoopReadOptions) {
-      conf = ((HadoopReadOptions) options).getConf();
-    }
+    ParquetConfiguration conf = 
Objects.requireNonNull(options).getConfiguration();
     for (String property : options.getPropertyNames()) {
       conf.set(property, options.getProperty(property));
     }
+    for (Map.Entry<String, String> property : new Configuration()) {

Review Comment:
   The Hadoop specific stuff I agree that it would be a bit silly. However, 
this class `InternalParquetRecordReader` is part of the read/write API for 
which we're trying to address it. That the read/write API is part of 
parquet-hadoop is somewhat unfortunate, but changing it at this point would 
break stuff.





> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
>                 Key: PARQUET-2347
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2347
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration 
> class, which is used throughout Parquet's reading and writing logic. If we 
> include our own interface for this, this could potentially allow users to use 
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream 
> projects, the constructors and methods using Hadoop's constructor should be 
> preserved for the time being, though I would favour deprecation in the near 
> future.
> This is part of an effort that has been [discussed on the dev mailing 
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

Reply via email to