[ 
https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697273#comment-16697273
 ] 

ASF GitHub Bot commented on DRILL-6865:
---------------------------------------

arina-ielchiieva commented on a change in pull request #1552:  DRILL-6865: 
Query returns wrong result when filter pruning happens
URL: https://github.com/apache/drill/pull/1552#discussion_r235950497
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##########
 @@ -85,6 +85,8 @@
 
   private List<EndpointAffinity> endpointAffinities;
   private ParquetGroupScanStatistics parquetGroupScanStatistics;
+  // whether all row groups of this group scan fully matches the filter
 
 Review comment:
   fully match

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
>                 Key: DRILL-6865
>                 URL: https://issues.apache.org/jira/browse/DRILL-6865
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.14.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Blocker
>             Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or 
> all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate 
> does not support, they can be omitted for some cases from the resulting 
> filter predicate. When row groups fully match predicates which left in the 
> filter, the whole filter is removed from the plan and the wrong result is 
> returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey > 5;
> create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey < 5;
> select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like 
> '%10%';
> {code}
> returns
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0  | n_nationkey  |     n_name      | n_regionkey  |                     
>                                  n_comment                                    
>                   |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1    | 6            | FRANCE          | 3            | refully final 
> requests. regular, ironi                                                      
>                         |
> | t1    | 7            | GERMANY         | 3            | l platelets. 
> regular accounts x-ray: unusual, regular acco                                 
>                          |
> | t1    | 8            | INDIA           | 2            | ss excuses cajole 
> slyly across the packages. deposits print aroun                               
>                     |
> | t1    | 9            | INDONESIA       | 2            |  slyly express 
> asymptotes. regular deposits haggle slyly. carefully ironic hockey players 
> sleep blithely. carefull  |
> | t1    | 10           | IRAN            | 4            | efully alongside of 
> the slyly final dependencies.                                                 
>                   |
> | t1    | 11           | IRAQ            | 4            | nic deposits boost 
> atop the quickly final requests? quickly regula                               
>                    |
> | t1    | 12           | JAPAN           | 2            | ously. final, 
> express gifts cajole a                                                        
>                         |
> | t1    | 13           | JORDAN          | 4            | ic deposits are 
> blithely about the carefully regular pa                                       
>                       |
> | t1    | 14           | KENYA           | 0            |  pending excuses 
> haggle furiously deposits. pending, express pinto beans wake fluffily past t  
>                      |
> | t1    | 15           | MOROCCO         | 0            | rns. blithely bold 
> courts among the closely regular packages use furiously bold platelets?       
>                    |
> | t1    | 16           | MOZAMBIQUE      | 0            | s. ironic, unusual 
> asymptotes wake blithely r                                                    
>                    |
> | t1    | 17           | PERU            | 1            | platelets. blithely 
> pending dependencies use fluffily across the even pinto beans. carefully 
> silent accoun          |
> | t1    | 18           | CHINA           | 2            | c dependencies. 
> furiously express notornis sleep slyly regular accounts. ideas sleep. depos   
>                       |
> | t1    | 19           | ROMANIA         | 3            | ular asymptotes are 
> about the furious multipliers. express dependencies nag above the ironically 
> ironic account     |
> | t1    | 20           | SAUDI ARABIA    | 4            | ts. silent requests 
> haggle. closely express packages sleep across the blithely                    
>                   |
> | t1    | 21           | VIETNAM         | 2            | hely enticingly 
> express accounts. even, final                                                 
>                       |
> | t1    | 22           | RUSSIA          | 3            |  requests against 
> the platelets use never according to the quickly regular pint                 
>                     |
> | t1    | 23           | UNITED KINGDOM  | 3            | eans boost 
> carefully special requests. accounts are. carefull                            
>                            |
> | t1    | 24           | UNITED STATES   | 1            | y final packages. 
> slow foxes cajole quickly. quickly silent platelets breach ironic accounts. 
> unusual pinto be      |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0  | n_nationkey  | n_name  | n_regionkey  |                      
> n_comment                      |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1    | 10           | IRAN    | 4            | efully alongside of the 
> slyly final dependencies.   |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contains a predicate which wasn't 
> applied:
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, 
> cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, 
> id = 400
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount 
> = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0 
> memory}, id = 399
> 00-02        Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR 
> T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 
> 0.0 network, 0.0 memory}, id = 398
> 00-03          Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType = 
> RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04            Scan(table=[[dfs, tmp, multy]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/multy/t1/0_0_0.parquet]], 
> selectionRoot=file:/tmp/multy, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = 
> RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with 
> single row group is queried
> But for the case, when parquet table has a single row group, the filter is 
> left in the plan:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from 
> cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where 
> n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00    Screen
> 00-01      Project(**=[$0])
> 00-02        Project(T0¦¦**=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[>($1, -1)])
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/singleRowGroupTable]], 
> selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to