Xianyang Liu created PARQUET-2365: ------------------------------------- Summary: Fixes NPE when rewriting column without column index Key: PARQUET-2365 URL: https://issues.apache.org/jira/browse/PARQUET-2365 Project: Parquet Issue Type: Bug Reporter: Xianyang Liu
The ColumnIndex could be null in some scenes, for example, the float/double column contains NaN or the size has exceeded the expected value. And the page header statistics are not written anymore after we supported ColumnIndex. So we will get NPE when rewriting the column without ColumnIndex due to we need to get NULL page statistics when converted from the ColumnIndex(NULL) or page header statistics(NULL). Such as the following: ```java java.lang.NullPointerException at org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:727) at org.apache.parquet.hadoop.ParquetFileWriter.innerWriteDataPage(ParquetFileWriter.java:663) at org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:650) at org.apache.parquet.hadoop.rewrite.ParquetRewriter.processChunk(ParquetRewriter.java:453) at org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:317) at org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)