Xianyang Liu created PARQUET-2343:
-------------------------------------

             Summary: Fixes NPE when rewriting file with multiple rowgroups
                 Key: PARQUET-2343
                 URL: https://issues.apache.org/jira/browse/PARQUET-2343
             Project: Parquet
          Issue Type: Bug
            Reporter: Xianyang Liu


Currently, the ParquetRewiter creates the `ColumnReadStoreImpl crStore` and 
reuses it for all the blocks rewriting. This should be incorrect and we should 
create the `crStore` for each block that needs to be rewritten. Otherwise, we 
will fail as the following:
```java
java.lang.NullPointerException
        at 
org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:620)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:594)
        at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:735)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
        at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:82)
        at 
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:316)
        at 
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to