Xianyang Liu created PARQUET-2343:
-------------------------------------
Summary: Fixes NPE when rewriting file with multiple rowgroups
Key: PARQUET-2343
URL: https://issues.apache.org/jira/browse/PARQUET-2343
Project: Parquet
Issue Type: Bug
Reporter: Xianyang Liu
Currently, the ParquetRewiter creates the `ColumnReadStoreImpl crStore` and
reuses it for all the blocks rewriting. This should be incorrect and we should
create the `crStore` for each block that needs to be rewritten. Otherwise, we
will fail as the following:
```java
java.lang.NullPointerException
at
org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:620)
at
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:594)
at
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:735)
at
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
at
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
at
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:82)
at
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:316)
at
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)