[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

ASF GitHub Bot (Jira) Fri, 13 Oct 2023 08:46:19 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774964#comment-17774964
 ]


ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------

amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1358432821


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api/ReadSupport.java:
##########
@@ -101,6 +120,24 @@ abstract public RecordMaterializer<T> prepareForRead(
           MessageType fileSchema,
           ReadContext readContext);
 
+  /**
+   * called in {@link 
org.apache.hadoop.mapreduce.RecordReader#initialize(org.apache.hadoop.mapreduce.InputSplit,
 org.apache.hadoop.mapreduce.TaskAttemptContext)} in the back end
+   * the returned RecordMaterializer will materialize the records and add them 
to the destination
+   *
+   * @param configuration    the configuration
+   * @param keyValueMetaData the app specific metadata from the file
+   * @param fileSchema       the schema of the file
+   * @param readContext      returned by the init method
+   * @return the recordMaterializer that will materialize the records
+   */
+  public RecordMaterializer<T> prepareForRead(
+      ParquetConfiguration configuration,
+      Map<String, String> keyValueMetaData,
+      MessageType fileSchema,
+      ReadContext readContext) {
+    throw new UnsupportedOperationException("Override 
prepareForRead(ParquetConfiguration, Map<String, String>, MessageType, 
ReadContext)");

Review Comment:
   I follow the example set by `ReadSupport#init(Configuration, Map, 
MessageType)`. As this error will not occur unless you are implementing your 
own `ReadSupport` class, I am not sure whether there needs to be that much more 
information in the exception. I'll add a reference to the `ReadSupport` class 
though.





> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
>                 Key: PARQUET-2347
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2347
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration 
> class, which is used throughout Parquet's reading and writing logic. If we 
> include our own interface for this, this could potentially allow users to use 
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream 
> projects, the constructors and methods using Hadoop's constructor should be 
> preserved for the time being, though I would favour deprecation in the near 
> future.
> This is part of an effort that has been [discussed on the dev mailing 
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

Reply via email to