[ 
https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116262#comment-16116262
 ] 

Junjie Chen commented on PARQUET-1061:
--------------------------------------

In parquet storage handler, when parquet.task.side.metadata set, rowGroupOffset 
is null.  While when hive doesn't use storage handle, it implements 
ParquetRecordReaderWrapper by itself which use deprecated ParuqetInputSplit 
construct. Then the rowGroupOffset is not null.  So this is not a parquet 
issue. 

> parquet dictionary filter does not work.
> ----------------------------------------
>
>                 Key: PARQUET-1061
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1061
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.9.0
>         Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
>            Reporter: Junjie Chen
>
> When perform selective query, we observed that dictionary filter was not 
> applied.  Please see following code snippet. 
>     if (rowGroupOffsets != null) {
>       // verify a row group was found for each offset
>       List<BlockMetaData> blocks = reader.getFooter().getBlocks();
>       if (blocks.size() != rowGroupOffsets.length) {
>         throw new IllegalStateException(
>             "All of the offsets in the split should be found in the file."
>             + " expected: " + Arrays.toString(rowGroupOffsets)
>             + " found: " + blocks);
>       }
>     } else {
> *Why apply data filter when row group offset equal to null? *
>       // apply data filters
>       reader.filterRowGroups(getFilter(configuration));
>     }
> I can enable filter after move else block code into second layer if. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to