[ 
https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6865:
---------------------------------------
    Description: 
In DRILL-5796 was implemented removing the filter from the plan when some (or 
all) row groups of parquet table fully match the filter.

For the case when filter has some predicates which parquet filter predicate 
does not support, they can be omitted for some cases from the resulting filter 
predicate. When row groups fully match predicates which left in the filter, the 
whole filter is removed from the plan and the wrong result is returned.

Example of the query for reproducing this bug:
{code:sql}
create table dfs.tmp.`multy/t1` as select * from cp.`tpch/nation.parquet` where 
n_nationkey > 5;
create table dfs.tmp.`multy/t2` as select * from cp.`tpch/nation.parquet` where 
n_nationkey < 5;
select * from dfs.tmp.`multy` where n_nationkey > 5 and n_nationkey like '%10%';
{code}
returns 
{noformat}
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| dir0  | n_nationkey  |     n_name      | n_regionkey  |                       
                               n_comment                                        
              |
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
| t1    | 6            | FRANCE          | 3            | refully final 
requests. regular, ironi                                                        
                      |
| t1    | 7            | GERMANY         | 3            | l platelets. regular 
accounts x-ray: unusual, regular acco                                           
               |
| t1    | 8            | INDIA           | 2            | ss excuses cajole 
slyly across the packages. deposits print aroun                                 
                  |
| t1    | 9            | INDONESIA       | 2            |  slyly express 
asymptotes. regular deposits haggle slyly. carefully ironic hockey players 
sleep blithely. carefull  |
| t1    | 10           | IRAN            | 4            | efully alongside of 
the slyly final dependencies.                                                   
                |
| t1    | 11           | IRAQ            | 4            | nic deposits boost 
atop the quickly final requests? quickly regula                                 
                 |
| t1    | 12           | JAPAN           | 2            | ously. final, express 
gifts cajole a                                                                  
              |
| t1    | 13           | JORDAN          | 4            | ic deposits are 
blithely about the carefully regular pa                                         
                    |
| t1    | 14           | KENYA           | 0            |  pending excuses 
haggle furiously deposits. pending, express pinto beans wake fluffily past t    
                   |
| t1    | 15           | MOROCCO         | 0            | rns. blithely bold 
courts among the closely regular packages use furiously bold platelets?         
                 |
| t1    | 16           | MOZAMBIQUE      | 0            | s. ironic, unusual 
asymptotes wake blithely r                                                      
                 |
| t1    | 17           | PERU            | 1            | platelets. blithely 
pending dependencies use fluffily across the even pinto beans. carefully silent 
accoun          |
| t1    | 18           | CHINA           | 2            | c dependencies. 
furiously express notornis sleep slyly regular accounts. ideas sleep. depos     
                    |
| t1    | 19           | ROMANIA         | 3            | ular asymptotes are 
about the furious multipliers. express dependencies nag above the ironically 
ironic account     |
| t1    | 20           | SAUDI ARABIA    | 4            | ts. silent requests 
haggle. closely express packages sleep across the blithely                      
                |
| t1    | 21           | VIETNAM         | 2            | hely enticingly 
express accounts. even, final                                                   
                    |
| t1    | 22           | RUSSIA          | 3            |  requests against the 
platelets use never according to the quickly regular pint                       
              |
| t1    | 23           | UNITED KINGDOM  | 3            | eans boost carefully 
special requests. accounts are. carefull                                        
               |
| t1    | 24           | UNITED STATES   | 1            | y final packages. 
slow foxes cajole quickly. quickly silent platelets breach ironic accounts. 
unusual pinto be      |
+-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
{noformat}

but single row should be returned:
{noformat}
+-------+--------------+---------+--------------+-----------------------------------------------------+
| dir0  | n_nationkey  | n_name  | n_regionkey  |                      
n_comment                      |
+-------+--------------+---------+--------------+-----------------------------------------------------+
| t1    | 10           | IRAN    | 4            | efully alongside of the slyly 
final dependencies.   |
+-------+--------------+---------+--------------+-----------------------------------------------------+
{noformat}
Filter is removed from the plan, but it contained condition which wasn't 
applied:
{noformat}
00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, 
cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, id 
= 400
00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 
19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0 
memory}, id = 399
00-02        Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T1¦¦**): 
rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 0.0 network, 
0.0 memory}, id = 398
00-03          Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType = 
RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative 
cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
00-04            Scan(table=[[dfs, tmp, multy]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tmp/multy/t1/0_0_0.parquet]], 
selectionRoot=file:/tmp/multy, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = 
RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative cost 
= {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
{noformat}
----

Additionally, a filter is not removed from the plan when parquet table with 
single row group is queried

But for the case, when parquet table has a single row group, the filter is left 
in the plan:
{code:sql}
create table dfs.tmp.`singleRowGroupTable` as select * from 
cp.`tpch/nation.parquet`;
explain plan for select * from dfs.tmp.`singleRowGroupTable` where n_nationkey 
> -1;
{code}
returns plan
{noformat}
00-00    Screen
00-01      Project(**=[$0])
00-02        Project(T0¦¦**=[$0])
00-03          SelectionVectorRemover
00-04            Filter(condition=[>($1, -1)])
00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/tmp/singleRowGroupTable]], 
selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
{noformat}

  was:
In DRILL-5796 was implemented removing the filter from the plan when some (or 
all) row groups of parquet table fully match the filter.

But for the case, when parquet table has a single row group, the filter is left 
in the plan:
{code:sql}
create table dfs.tmp.`singleRowGroupTable` as select * from 
cp.`tpch/nation.parquet`;
explain plan for select * from dfs.tmp.`singleRowGroupTable` where n_nationkey 
> -1;
{code}
returns plan
{noformat}
00-00    Screen
00-01      Project(**=[$0])
00-02        Project(T0¦¦**=[$0])
00-03          SelectionVectorRemover
00-04            Filter(condition=[>($1, -1)])
00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/tmp/singleRowGroupTable]], 
selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
{noformat}


> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
>                 Key: DRILL-6865
>                 URL: https://issues.apache.org/jira/browse/DRILL-6865
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.14.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>             Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or 
> all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate 
> does not support, they can be omitted for some cases from the resulting 
> filter predicate. When row groups fully match predicates which left in the 
> filter, the whole filter is removed from the plan and the wrong result is 
> returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multy/t1` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey > 5;
> create table dfs.tmp.`multy/t2` as select * from cp.`tpch/nation.parquet` 
> where n_nationkey < 5;
> select * from dfs.tmp.`multy` where n_nationkey > 5 and n_nationkey like 
> '%10%';
> {code}
> returns 
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0  | n_nationkey  |     n_name      | n_regionkey  |                     
>                                  n_comment                                    
>                   |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1    | 6            | FRANCE          | 3            | refully final 
> requests. regular, ironi                                                      
>                         |
> | t1    | 7            | GERMANY         | 3            | l platelets. 
> regular accounts x-ray: unusual, regular acco                                 
>                          |
> | t1    | 8            | INDIA           | 2            | ss excuses cajole 
> slyly across the packages. deposits print aroun                               
>                     |
> | t1    | 9            | INDONESIA       | 2            |  slyly express 
> asymptotes. regular deposits haggle slyly. carefully ironic hockey players 
> sleep blithely. carefull  |
> | t1    | 10           | IRAN            | 4            | efully alongside of 
> the slyly final dependencies.                                                 
>                   |
> | t1    | 11           | IRAQ            | 4            | nic deposits boost 
> atop the quickly final requests? quickly regula                               
>                    |
> | t1    | 12           | JAPAN           | 2            | ously. final, 
> express gifts cajole a                                                        
>                         |
> | t1    | 13           | JORDAN          | 4            | ic deposits are 
> blithely about the carefully regular pa                                       
>                       |
> | t1    | 14           | KENYA           | 0            |  pending excuses 
> haggle furiously deposits. pending, express pinto beans wake fluffily past t  
>                      |
> | t1    | 15           | MOROCCO         | 0            | rns. blithely bold 
> courts among the closely regular packages use furiously bold platelets?       
>                    |
> | t1    | 16           | MOZAMBIQUE      | 0            | s. ironic, unusual 
> asymptotes wake blithely r                                                    
>                    |
> | t1    | 17           | PERU            | 1            | platelets. blithely 
> pending dependencies use fluffily across the even pinto beans. carefully 
> silent accoun          |
> | t1    | 18           | CHINA           | 2            | c dependencies. 
> furiously express notornis sleep slyly regular accounts. ideas sleep. depos   
>                       |
> | t1    | 19           | ROMANIA         | 3            | ular asymptotes are 
> about the furious multipliers. express dependencies nag above the ironically 
> ironic account     |
> | t1    | 20           | SAUDI ARABIA    | 4            | ts. silent requests 
> haggle. closely express packages sleep across the blithely                    
>                   |
> | t1    | 21           | VIETNAM         | 2            | hely enticingly 
> express accounts. even, final                                                 
>                       |
> | t1    | 22           | RUSSIA          | 3            |  requests against 
> the platelets use never according to the quickly regular pint                 
>                     |
> | t1    | 23           | UNITED KINGDOM  | 3            | eans boost 
> carefully special requests. accounts are. carefull                            
>                            |
> | t1    | 24           | UNITED STATES   | 1            | y final packages. 
> slow foxes cajole quickly. quickly silent platelets breach ironic accounts. 
> unusual pinto be      |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0  | n_nationkey  | n_name  | n_regionkey  |                      
> n_comment                      |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1    | 10           | IRAN    | 4            | efully alongside of the 
> slyly final dependencies.   |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contained condition which wasn't 
> applied:
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, 
> cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, 
> id = 400
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount 
> = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0 
> memory}, id = 399
> 00-02        Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR 
> T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 
> 0.0 network, 0.0 memory}, id = 398
> 00-03          Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType = 
> RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04            Scan(table=[[dfs, tmp, multy]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/multy/t1/0_0_0.parquet]], 
> selectionRoot=file:/tmp/multy, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = 
> RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative 
> cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with 
> single row group is queried
> But for the case, when parquet table has a single row group, the filter is 
> left in the plan:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from 
> cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where 
> n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00    Screen
> 00-01      Project(**=[$0])
> 00-02        Project(T0¦¦**=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[>($1, -1)])
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/singleRowGroupTable]], 
> selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to