Xianyang Liu created PARQUET-2365:
-------------------------------------

             Summary: Fixes NPE when rewriting column without column index
                 Key: PARQUET-2365
                 URL: https://issues.apache.org/jira/browse/PARQUET-2365
             Project: Parquet
          Issue Type: Bug
            Reporter: Xianyang Liu


The ColumnIndex could be null in some scenes, for example, the float/double 
column contains NaN or the size has exceeded the expected value. And the page 
header statistics are not written anymore after we supported ColumnIndex. So we 
will get NPE when rewriting the column without ColumnIndex due to we need to 
get NULL page statistics when converted from the ColumnIndex(NULL) or page 
header statistics(NULL). Such as the following:
```java
java.lang.NullPointerException
        at 
org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:727)
        at 
org.apache.parquet.hadoop.ParquetFileWriter.innerWriteDataPage(ParquetFileWriter.java:663)
        at 
org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:650)
        at 
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processChunk(ParquetRewriter.java:453)
        at 
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:317)
        at 
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to