Aman Sinha created DRILL-3803:
---------------------------------
Summary: Support inequality filter evaluation as part of join
operators
Key: DRILL-3803
URL: https://issues.apache.org/jira/browse/DRILL-3803
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Relational Operators
Reporter: Aman Sinha
Assignee: Aman Sinha
Currently Drill evaluates an inequality filter after the join filter. See
below:
{code}
0: jdbc:drill:zk=local> explain plan for select n1.n_name from
cp.`tpch/nation.parquet` n1 inner join cp.`tpch/region.parquet` n2 on
n1.n_nationkey = n2.n_nationkey and n1.n_regionkey < n2.n_regionkey;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(n_name=[$2])
00-02 SelectionVectorRemover
00-03 Filter(condition=[<($1, $4)])
00-04 HashJoin(condition=[=($0, $3)], joinType=[inner])
00-06 Project(n_nationkey=[$2], n_regionkey=[$0], n_name=[$1])
00-08 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
columns=[`n_nationkey`, `n_regionkey`, `n_name`]]])
00-05 Project(n_nationkey0=[$0], n_regionkey0=[$1])
00-07 Project(n_nationkey=[$1], n_regionkey=[$0])
00-09 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]],
selectionRoot=classpath:/tpch/region.parquet, numFiles=1,
columns=[`n_nationkey`, `n_regionkey`]]])
{code}
Suppose the inequality filter is highly selective but the join's output
cardinality is large. It would be substantially better to push this filter
into the join and evaluate both equality and inequality as part of the join.
This is an enhancement. We may decide at a later time to split this into 2
JIRAs : one for HashJoin and one for MergeJoin.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)