[ https://issues.apache.org/jira/browse/DRILL-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173820#comment-16173820 ]
ASF GitHub Bot commented on DRILL-5795: --------------------------------------- Github user dprofeta commented on the issue: https://github.com/apache/drill/pull/949 I will add a unit test to test the number of rowgroups that are scanned by the groupscan to see if the filter is well able to prune rowgroup. > Filter pushdown for parquet handles multi rowgroup file > ------------------------------------------------------- > > Key: DRILL-5795 > URL: https://issues.apache.org/jira/browse/DRILL-5795 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Reporter: Damien Profeta > Assignee: Damien Profeta > Labels: doc-impacting > > DRILL-1950 implemented the filter pushdown for parquet file but only in the > case of one rowgroup per parquet file. In the case of multiple rowgroups per > files, it detects that the rowgroup can be pruned but then tell to the > drillbit to read the whole file which leads to performance issue. > Having multiple rowgroup per file helps to handle partitioned dataset and > still read only the relevant subset of data without ending with more file > than really needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)