wgtmac commented on code in PR #1173: URL: https://github.com/apache/parquet-mr/pull/1173#discussion_r1364833487
########## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ########## @@ -543,6 +546,11 @@ public static ColumnIndex build( * the statistics to be added */ public void add(Statistics<?> stats) { + if (stats.isEmpty()) { Review Comment: Let me try to understand what happens here. `convertStatistics` is used to recover page statistics from ColumnIndex or original page header if the ColumnIndex is unavailable. The problem emerges when ColumnIndex is unavailable. Am I correct? If true, then why do we need those changes in the ColumnIndexBuilder? ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ########## @@ -335,10 +340,15 @@ private void processBlocksFromReader() throws IOException { } } - private void processChunk(ColumnChunkMetaData chunk, - CompressionCodecName newCodecName, - ColumnChunkEncryptorRunTime columnChunkEncryptorRunTime, - boolean encryptColumn) throws IOException { + /** + * Rewrite a single column with the given new compression codec or new encryptor Review Comment: ```suggestion * Rewrite a single column with the given new compression codec and/or new encryptor ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ########## @@ -612,13 +612,13 @@ public void writeDataPage( * @throws IOException if any I/O error occurs during writing the file */ public void writeDataPage( - int valueCount, int uncompressedPageSize, - BytesInput bytes, - Statistics<?> statistics, - long rowCount, - Encoding rlEncoding, - Encoding dlEncoding, - Encoding valuesEncoding) throws IOException { + int valueCount, int uncompressedPageSize, + BytesInput bytes, + Statistics<?> statistics, + long rowCount, + Encoding rlEncoding, + Encoding dlEncoding, + Encoding valuesEncoding) throws IOException { Review Comment: Could you avoid these style changes? They are unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org