[
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768418#comment-17768418
]
ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------
amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335214709
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java:
##########
@@ -167,13 +169,13 @@ public float getProgress() throws IOException,
InterruptedException {
public void initialize(ParquetFileReader reader, ParquetReadOptions options)
{
// copy custom configuration to the Configuration passed to the ReadSupport
- Configuration conf = new Configuration();
- if (options instanceof HadoopReadOptions) {
- conf = ((HadoopReadOptions) options).getConf();
- }
+ ParquetConfiguration conf =
Objects.requireNonNull(options).getConfiguration();
for (String property : options.getPropertyNames()) {
conf.set(property, options.getProperty(property));
}
+ for (Map.Entry<String, String> property : new Configuration()) {
Review Comment:
The Hadoop specific stuff I agree that it would be a bit silly. However,
this class `InternalParquetRecordReader` is part of the read/write API for
which we're trying to address it. That the read/write API is part of
parquet-hadoop is somewhat unfortunate, but changing it at this point would
break stuff.
> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
> Key: PARQUET-2347
> URL: https://issues.apache.org/jira/browse/PARQUET-2347
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Atour Mousavi Gourabi
> Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration
> class, which is used throughout Parquet's reading and writing logic. If we
> include our own interface for this, this could potentially allow users to use
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream
> projects, the constructors and methods using Hadoop's constructor should be
> preserved for the time being, though I would favour deprecation in the near
> future.
> This is part of an effort that has been [discussed on the dev mailing
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)