[ https://issues.apache.org/jira/browse/PARQUET-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777618#comment-17777618 ]
ASF GitHub Bot commented on PARQUET-2365: ----------------------------------------- ConeyLiu commented on code in PR #1173: URL: https://github.com/apache/parquet-mr/pull/1173#discussion_r1366615995 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ########## @@ -612,13 +612,13 @@ public void writeDataPage( * @throws IOException if any I/O error occurs during writing the file */ public void writeDataPage( - int valueCount, int uncompressedPageSize, - BytesInput bytes, - Statistics<?> statistics, - long rowCount, - Encoding rlEncoding, - Encoding dlEncoding, - Encoding valuesEncoding) throws IOException { + int valueCount, int uncompressedPageSize, + BytesInput bytes, + Statistics<?> statistics, + long rowCount, + Encoding rlEncoding, + Encoding dlEncoding, + Encoding valuesEncoding) throws IOException { Review Comment: reverted > Fixes NPE when rewriting column without column index > ---------------------------------------------------- > > Key: PARQUET-2365 > URL: https://issues.apache.org/jira/browse/PARQUET-2365 > Project: Parquet > Issue Type: Bug > Reporter: Xianyang Liu > Priority: Major > > The ColumnIndex could be null in some scenes, for example, the float/double > column contains NaN or the size has exceeded the expected value. And the page > header statistics are not written anymore after we supported ColumnIndex. So > we will get NPE when rewriting the column without ColumnIndex due to we need > to get NULL page statistics when converted from the ColumnIndex(NULL) or page > header statistics(NULL). Such as the following: > ```java > java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:727) > at > org.apache.parquet.hadoop.ParquetFileWriter.innerWriteDataPage(ParquetFileWriter.java:663) > at > org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:650) > at > org.apache.parquet.hadoop.rewrite.ParquetRewriter.processChunk(ParquetRewriter.java:453) > at > org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:317) > at > org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)