[
https://issues.apache.org/jira/browse/PARQUET-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769902#comment-17769902
]
ASF GitHub Bot commented on PARQUET-2351:
-----------------------------------------
wgtmac commented on code in PR #1157:
URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1339538597
##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version,
Path file) throws IOExcep
writeData(f, writer);
}
+ private static void prepareFileWithConf(WriterVersion version, Path file)
throws IOException {
+ Configuration configuration = new Configuration();
Review Comment:
It seems that these specific configurations defined in the
ParquetOutputFormat are solely used for ParquetOutputFormat to create a
RecordWriter, which actually puts all of them into a ParquetProperties.
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L457-L484
IIUC, relying on settings from Hadoop configuration is discouraged, we
should use ParquetProperties to set all those things.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends
Builder<T, SELF>> {
private long rowGroupSize = DEFAULT_BLOCK_SIZE;
private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT;
private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED;
- private ParquetProperties.Builder encodingPropsBuilder =
+ private final ParquetProperties.Builder encodingPropsBuilder =
ParquetProperties.builder();
+ private boolean isPageSizeSet = false;
Review Comment:
TBH, these would make the code less maintainable. Not sure if Optional would
make them more organized.
> ParquetWriter/ParquetReader should parse options directly from supplied
> Configuration
> -------------------------------------------------------------------------------------
>
> Key: PARQUET-2351
> URL: https://issues.apache.org/jira/browse/PARQUET-2351
> Project: Parquet
> Issue Type: Improvement
> Reporter: Claire McGinty
> Priority: Minor
>
> As a Parquet user, my expectation is that ParquetWriter/ParquetReader will
> automatically parse any options passed in the supplied Configuration. For
> example:
>
> ```
> Configuration conf = new Configuration();
> conf.setBoolean(ParquetOutputFormat.BLOOM_FILTER_ENABLED, true);
> ParquetWriter<Car> writer = AvroParquetWriter.<Car>builder(path)
> .withSchema(Car.SCHEMA$)
> .withConf(conf)
> .build();
> ```
>
> However, the above code results in a ParquetWriter where `bloomFilterEnabled`
> is set to false-the expected way to configure this is to use
> `ParquetWriter#withBloomFilterEnabled` directly.
>
> IMO, when `ParquetWriter#withConf` is invoked, it should also delegate to
> `encodingPropsBuilder` and set any Configured properties.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)