[ 
https://issues.apache.org/jira/browse/PARQUET-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White resolved PARQUET-9.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.0
         Assignee: Tom White

Fixed in 
https://git-wip-us.apache.org/repos/asf?p=incubator-parquet-mr.git;a=commit;h=2d8ebdbe00786823658bcdd2817e6b5afee15b25

> InternalParquetRecordReader will not read multiple blocks when filtering
> ------------------------------------------------------------------------
>
>                 Key: PARQUET-9
>                 URL: https://issues.apache.org/jira/browse/PARQUET-9
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>            Assignee: Tom White
>             Fix For: 1.6.0
>
>
> The InternalParquetRecordReader keeps track of the count of records it has 
> processed and uses that count to know when it is finished and when to load a 
> new row group of data. But when it is wrapping a FilteredRecordReader, this 
> count is not updated for records that are filtered, so when the reader 
> exhausts the block it is reading, it will continue calling read() on the 
> filtered reader and will pass null values to the caller.
> The quick fix is to detect null values returned by the record reader and 
> update the count to read the next row group. But the longer-term solution is 
> to correctly account for the filtered records.
> The pull request for the quick fix is 
> [#9|https://github.com/apache/incubator-parquet-mr/pull/9].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to