[ 
https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Ordynskiy updated DRILL-6465:
-----------------------------------
    Description: 
For several SQL operators Transitive closure is not working during Partition 
Pruning and Filter Pushdown for the left table in Join.
 If I use several local conditions, then Drill scans full left table in Join.
 But if we move additional conditions to the WHERE statement, then Transitive 
closure works fine for all joined tables

*Query BETWEEN:*
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
AND t2.y BETWEEN 1987 AND 1988;
{code}
*Expected result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2])]{code}
*Actual result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
Partition(values:[1991, 3, 4])
]
{code}
*There is the same Transitive closure behavior for this logical operators:*
 * NOT IN
 * LIKE
 * NOT LIKE

Also Transitive closure is not working during Partition Pruning and Filter 
Pushdown for this comparison operators:

*Query <*
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
AND t2.y < 1988;
{code}
*Expected result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2])]{code}
*Actual result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
Partition(values:[1991, 3, 4])]
{code}
*There is the same Transitive closure behavior for this comparison operators:***

>, <=, >=, <>

 But if I use the WHERE clause instead of local Join condition, Transitive 
closure works correct for all joined tables.

*Query:*
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
WHERE t2.y < 1988;
{code}
*Result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2])]
{code}
 

It was examples for Drill Partition Pruning queries with hive. The same issues 
are in the Drill Filter Pushdown queries.

*Filter Pushdown 'BETWEEN' query example:*
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1
JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2
ON t1.y=t2.y
AND t2.y BETWEEN 1987 AND 1988;
{code}
*Expected result:*
{code:java}
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet], 
ReadEntryWithPath 
...
ReadEntryWithPath 
[path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]], 
selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
numFiles=8, numRowGroups=8
{code}
*Actual result:*
{code:java}
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]], 
selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
numFiles=1, numRowGroups=16
{code}
 

Issues can be reproduced using parquet files in the attachment "drill.zip"

 

  was:
For several SQL operators Transitive closure is not working during Partition 
Pruning and Filter Pushdown for the left table in Join.
If I use several local conditions, then Drill scans full left table in Join.
But if we move additional conditions to the WHERE statement, then Transitive 
closure works fine for all joined tables

*Query BETWEEN:*

 
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
AND t2.y BETWEEN 1987 AND 1988;
{code}
*Expected result:*

 

 
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2])]{code}
*Actual result:*

 

 
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
Partition(values:[1991, 3, 4])
]
{code}
*There is the same Transitive closure behavior for this logical operators:*
 * NOT IN
 * LIKE
 * NOT LIKE

Also Transitive closure is not working during Partition Pruning and Filter 
Pushdown for this comparison operators:

*Query <*

 
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
AND t2.y < 1988;
{code}
*Expected result:*

 

 
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2])]{code}
*Actual result:*
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
Partition(values:[1991, 3, 4])]
{code}
*There is the same Transitive closure behavior for this comparison operators:***

>, <=, >=, <>

 

But if I use the WHERE clause instead of local Join condition, Transitive 
closure works correct for all joined tables.

*Query:*

 
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM hive.`h_tab1` t1
JOIN hive.`h_tab2` t2
ON t1.y=t2.y
WHERE t2.y < 1988;
{code}
*Result:*

 

 
{code:java}
Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
Partition(values:[1987, 7, 2])]
{code}
 

It was examples for Drill Partition Pruning queries with hive. The same issues 
are in the Drill Filter Pushdown queries.

*Filter Pushdown 'BETWEEN' query example:*
{code:java}
EXPLAIN PLAN FOR
SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1
JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2
ON t1.y=t2.y
AND t2.y BETWEEN 1987 AND 1988;
{code}
*Expected result:*

 
{code:java}
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet], 
ReadEntryWithPath 
...
ReadEntryWithPath 
[path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]], 
selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
numFiles=8, numRowGroups=8
{code}
 

*Actual result:*

 
{code:java}
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]], 
selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
numFiles=1, numRowGroups=16
{code}
 

Issues can be reproduced using parquet files in the attachment "drill.zip"

 

 

 


> Transitive closure is not working in Drill for Join with multiple local 
> conditions
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-6465
>                 URL: https://issues.apache.org/jira/browse/DRILL-6465
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>         Environment: [^drill.zip]
>            Reporter: Denys Ordynskiy
>            Priority: Major
>         Attachments: drill.zip
>
>
> For several SQL operators Transitive closure is not working during Partition 
> Pruning and Filter Pushdown for the left table in Join.
>  If I use several local conditions, then Drill scans full left table in Join.
>  But if we move additional conditions to the WHERE statement, then Transitive 
> closure works fine for all joined tables
> *Query BETWEEN:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
> Partition(values:[1991, 3, 4])
> ]
> {code}
> *There is the same Transitive closure behavior for this logical operators:*
>  * NOT IN
>  * LIKE
>  * NOT LIKE
> Also Transitive closure is not working during Partition Pruning and Filter 
> Pushdown for this comparison operators:
> *Query <*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y < 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
> Partition(values:[1991, 3, 4])]
> {code}
> *There is the same Transitive closure behavior for this comparison 
> operators:***
> >, <=, >=, <>
>  But if I use the WHERE clause instead of local Join condition, Transitive 
> closure works correct for all joined tables.
> *Query:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> WHERE t2.y < 1988;
> {code}
> *Result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2])]
> {code}
>  
> It was examples for Drill Partition Pruning queries with hive. The same 
> issues are in the Drill Filter Pushdown queries.
> *Filter Pushdown 'BETWEEN' query example:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1
> JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet], 
> ReadEntryWithPath 
> ...
> ReadEntryWithPath 
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]], 
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
> numFiles=8, numRowGroups=8
> {code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]], 
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
> numFiles=1, numRowGroups=16
> {code}
>  
> Issues can be reproduced using parquet files in the attachment "drill.zip"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to