[ https://issues.apache.org/jira/browse/PARQUET-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769902#comment-17769902 ]
ASF GitHub Bot commented on PARQUET-2351: ----------------------------------------- wgtmac commented on code in PR #1157: URL: https://github.com/apache/parquet-mr/pull/1157#discussion_r1339538597 ########## parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java: ########## @@ -195,10 +200,28 @@ private static void prepareFile(WriterVersion version, Path file) throws IOExcep writeData(f, writer); } + private static void prepareFileWithConf(WriterVersion version, Path file) throws IOException { + Configuration configuration = new Configuration(); Review Comment: It seems that these specific configurations defined in the ParquetOutputFormat are solely used for ParquetOutputFormat to create a RecordWriter, which actually puts all of them into a ParquetProperties. https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L457-L484 IIUC, relying on settings from Hadoop configuration is discouraged, we should use ParquetProperties to set all those things. ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java: ########## @@ -358,9 +358,29 @@ public abstract static class Builder<T, SELF extends Builder<T, SELF>> { private long rowGroupSize = DEFAULT_BLOCK_SIZE; private int maxPaddingSize = MAX_PADDING_SIZE_DEFAULT; private boolean enableValidation = DEFAULT_IS_VALIDATING_ENABLED; - private ParquetProperties.Builder encodingPropsBuilder = + private final ParquetProperties.Builder encodingPropsBuilder = ParquetProperties.builder(); + private boolean isPageSizeSet = false; Review Comment: TBH, these would make the code less maintainable. Not sure if Optional would make them more organized. > ParquetWriter/ParquetReader should parse options directly from supplied > Configuration > ------------------------------------------------------------------------------------- > > Key: PARQUET-2351 > URL: https://issues.apache.org/jira/browse/PARQUET-2351 > Project: Parquet > Issue Type: Improvement > Reporter: Claire McGinty > Priority: Minor > > As a Parquet user, my expectation is that ParquetWriter/ParquetReader will > automatically parse any options passed in the supplied Configuration. For > example: > > ``` > Configuration conf = new Configuration(); > conf.setBoolean(ParquetOutputFormat.BLOOM_FILTER_ENABLED, true); > ParquetWriter<Car> writer = AvroParquetWriter.<Car>builder(path) > .withSchema(Car.SCHEMA$) > .withConf(conf) > .build(); > ``` > > However, the above code results in a ParquetWriter where `bloomFilterEnabled` > is set to false-the expected way to configure this is to use > `ParquetWriter#withBloomFilterEnabled` directly. > > IMO, when `ParquetWriter#withConf` is invoked, it should also delegate to > `encodingPropsBuilder` and set any Configured properties. -- This message was sent by Atlassian Jira (v8.20.10#820010)