[ https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092409#comment-16092409 ]
Junjie Chen commented on PARQUET-1061: -------------------------------------- Yes, I already set parquet.filter.dictionary.enabled to true. ParquetFileReader#filterRowGroup is called in ParquetRecordReader#initializeInternalReader. while in initializeInternalReader, filterRowGroup will be called when (*rowGroupoffset == null*). And if rowGroupOffset == null means no row group in split(am I right?), so the call is wrong here. > parquet dictionary filter does not work. > ---------------------------------------- > > Key: PARQUET-1061 > URL: https://issues.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master > Reporter: Junjie Chen > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List<BlockMetaData> blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v6.4.14#64029)