[ 
https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760843#comment-17760843
 ] 

ASF GitHub Bot commented on PARQUET-2343:
-----------------------------------------

ConeyLiu commented on PR #1136:
URL: https://github.com/apache/parquet-mr/pull/1136#issuecomment-1700757199

   Hi @wgtmac, could you help to review this? Thanks a lot.




> Fixes NPE when rewriting file with multiple rowgroups
> -----------------------------------------------------
>
>                 Key: PARQUET-2343
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2343
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Xianyang Liu
>            Priority: Major
>
> Currently, the ParquetRewiter creates the `ColumnReadStoreImpl crStore` and 
> reuses it for all the blocks rewriting. This should be incorrect and we 
> should create the `crStore` for each block that needs to be rewritten. 
> Otherwise, we will fail as the following:
> ```java
> java.lang.NullPointerException
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:620)
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:594)
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:735)
>       at 
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
>       at 
> org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
>       at 
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:82)
>       at 
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:316)
>       at 
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to