[
https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502046#comment-16502046
]
Vitalii Diravka commented on DRILL-6465:
----------------------------------------
The issue for [NOT IN
operator|https://github.com/apache/drill/blob/6fcaf4268eddcb09010b5d9c5dfb3b3be5c3f903/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetFilterPushdownWithTransitivePredicates.java#L227]
for local join condition is solved and will be present with a new Calcite
version. CALCITE-2275
Other cases should be investigated separately.
> Transitive closure is not working in Drill for Join with multiple local
> conditions
> ----------------------------------------------------------------------------------
>
> Key: DRILL-6465
> URL: https://issues.apache.org/jira/browse/DRILL-6465
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.14.0
> Reporter: Denys Ordynskiy
> Assignee: Vitalii Diravka
> Priority: Major
> Fix For: 1.14.0
>
> Attachments: drill.zip
>
>
> For several SQL operators Transitive closure is not working during Partition
> Pruning and Filter Pushdown for the left table in Join.
> If I use several local conditions, then Drill scans full left table in Join.
> But if we move additional conditions to the WHERE statement, then Transitive
> closure works fine for all joined tables
> *Query BETWEEN:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1),
> columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]),
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]),
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]),
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]),
> Partition(values:[1988, 12, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1),
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5,
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]),
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]),
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]),
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]),
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]),
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]),
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]),
> Partition(values:[1991, 3, 4])
> ]
> {code}
> *There is the same Transitive closure behavior for this logical operators:*
> * NOT IN
> * LIKE
> * NOT LIKE
> Also Transitive closure is not working during Partition Pruning and Filter
> Pushdown for this comparison operators:
> *Query <*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y < 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1),
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]),
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]),
> Partition(values:[1987, 7, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1),
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5,
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]),
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]),
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]),
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]),
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]),
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]),
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]),
> Partition(values:[1991, 3, 4])]
> {code}
> *There is the same Transitive closure behavior for this comparison
> operators:***
> >, <=, >=, <>
> But if I use the WHERE clause instead of local Join condition, Transitive
> closure works correct for all joined tables.
> *Query:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> WHERE t2.y < 1988;
> {code}
> *Result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1),
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]),
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]),
> Partition(values:[1987, 7, 2])]
> {code}
>
> It was examples for Drill Partition Pruning queries with hive. The same
> issues are in the Drill Filter Pushdown queries.
> *Filter Pushdown 'BETWEEN' query example:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1
> JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet],
> ReadEntryWithPath
> ...
> ReadEntryWithPath
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]],
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1,
> numFiles=8, numRowGroups=8
> {code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]],
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1,
> numFiles=1, numRowGroups=16
> {code}
>
> Issues can be reproduced using parquet files in the attachment "drill.zip"
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)