[
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768421#comment-17768421
]
ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------
amousavigourabi commented on code in PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#discussion_r1335216373
##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -333,13 +406,17 @@ public Builder copy(ParquetReadOptions options) {
public ParquetReadOptions build() {
if (codecFactory == null) {
- codecFactory = HadoopCodecs.newFactory(0);
+ if (conf == null) {
+ codecFactory = HadoopCodecs.newFactory(0);
+ } else {
+ codecFactory = HadoopCodecs.newFactory(conf, 0);
Review Comment:
We still need a way to get the ParquetConfiguration to the HadoopCodecs, for
this I used a field in ParquetReadOptions. To this effect, I would also like to
start a conversation in the future about the options classes, as it seems to me
we can replace them with the more versatile ParquetConfiguration in most, if
not all places. This as I do think this current flow of passing the
ParquetConfiguration through the ParquetReadOptions is a bit hacky.
> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
> Key: PARQUET-2347
> URL: https://issues.apache.org/jira/browse/PARQUET-2347
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Atour Mousavi Gourabi
> Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration
> class, which is used throughout Parquet's reading and writing logic. If we
> include our own interface for this, this could potentially allow users to use
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream
> projects, the constructors and methods using Hadoop's constructor should be
> preserved for the time being, though I would favour deprecation in the near
> future.
> This is part of an effort that has been [discussed on the dev mailing
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)