[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774964#comment-17774964 ]
ASF GitHub Bot commented on PARQUET-2347: ----------------------------------------- amousavigourabi commented on code in PR #1141: URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1358432821 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api/ReadSupport.java: ########## @@ -101,6 +120,24 @@ abstract public RecordMaterializer<T> prepareForRead( MessageType fileSchema, ReadContext readContext); + /** + * called in {@link org.apache.hadoop.mapreduce.RecordReader#initialize(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)} in the back end + * the returned RecordMaterializer will materialize the records and add them to the destination + * + * @param configuration the configuration + * @param keyValueMetaData the app specific metadata from the file + * @param fileSchema the schema of the file + * @param readContext returned by the init method + * @return the recordMaterializer that will materialize the records + */ + public RecordMaterializer<T> prepareForRead( + ParquetConfiguration configuration, + Map<String, String> keyValueMetaData, + MessageType fileSchema, + ReadContext readContext) { + throw new UnsupportedOperationException("Override prepareForRead(ParquetConfiguration, Map<String, String>, MessageType, ReadContext)"); Review Comment: I follow the example set by `ReadSupport#init(Configuration, Map, MessageType)`. As this error will not occur unless you are implementing your own `ReadSupport` class, I am not sure whether there needs to be that much more information in the exception. I'll add a reference to the `ReadSupport` class though. > Add interface layer between Parquet and Hadoop Configuration > ------------------------------------------------------------ > > Key: PARQUET-2347 > URL: https://issues.apache.org/jira/browse/PARQUET-2347 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Reporter: Atour Mousavi Gourabi > Priority: Minor > > Parquet relies heavily on a few Hadoop classes, such as its Configuration > class, which is used throughout Parquet's reading and writing logic. If we > include our own interface for this, this could potentially allow users to use > Parquet's readers and writers without the Hadoop dependency later on. > In order to preserve backward compatibility and avoid breaking downstream > projects, the constructors and methods using Hadoop's constructor should be > preserved for the time being, though I would favour deprecation in the near > future. > This is part of an effort that has been [discussed on the dev mailing > list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8]. -- This message was sent by Atlassian Jira (v8.20.10#820010)