[
https://issues-test.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265591#comment-16265591
]
Jorge Machado commented on PARQUET-1061:
----------------------------------------
Hi guys,
I'm trying to read a parquet file in parallel outside of hadoop. Spark is using
the class ParquetInputSplit.
I would like to use it to but I'm wondering how to get the rowGroupOffsets[] ?
is this the start position from every single block ?
thanks
> parquet dictionary filter does not work.
> ----------------------------------------
>
> Key: PARQUET-1061
> URL: https://issues-test.apache.org/jira/browse/PARQUET-1061
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.9.0
> Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
> Reporter: Junjie Chen
> Priority: Major
>
> When perform selective query, we observed that dictionary filter was not
> applied. Please see following code snippet.
> if (rowGroupOffsets != null) {
> // verify a row group was found for each offset
> List<BlockMetaData> blocks = reader.getFooter().getBlocks();
> if (blocks.size() != rowGroupOffsets.length) {
> throw new IllegalStateException(
> "All of the offsets in the split should be found in the file."
> + " expected: " + Arrays.toString(rowGroupOffsets)
> + " found: " + blocks);
> }
> } else {
> *Why apply data filter when row group offset equal to null? *
> // apply data filters
> reader.filterRowGroups(getFilter(configuration));
> }
> I can enable filter after move else block code into second layer if.
--
This message was sent by Atlassian JIRA
(v7.6.0#76001)