[
https://issues.apache.org/jira/browse/DRILL-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624438#comment-16624438
]
Boaz Ben-Zvi commented on DRILL-6758:
-------------------------------------
Here is an example (see *o_custkey* and *c_custkey* in the HashJoin):
{code}
0: jdbc:drill:zk=local> explain plan including all attributes for select
ord.o_orderstatus, ord.o_orderkey from cp.`tpch/orders.parquet` ord left join
cp.`tpch/customer.parquet` cust on ord.o_custkey = cust.c_custkey;
+------+------+
| text | json |
+------+------+
| 00-00 Screen : rowType = RecordType(ANY o_orderstatus, ANY o_orderkey):
rowcount = 15000.0, cumulative cost = {64500.0 rows, 300000.0 cpu, 0.0 io, 0.0
network, 26400.000000000004 memory}, id = 11045
00-01 Project(o_orderstatus=[$0], o_orderkey=[$1]) : rowType =
RecordType(ANY o_orderstatus, ANY o_orderkey): rowcount = 15000.0, cumulative
cost = {63000.0 rows, 298500.0 cpu, 0.0 io, 0.0 network, 26400.000000000004
memory}, id = 11044
00-02 Project(o_orderstatus=[$1], o_orderkey=[$2]) : rowType =
RecordType(ANY o_orderstatus, ANY o_orderkey): rowcount = 15000.0, cumulative
cost = {48000.0 rows, 268500.0 cpu, 0.0 io, 0.0 network, 26400.000000000004
memory}, id = 11043
00-03 HashJoin(condition=[=($0, $3)], joinType=[left]) : rowType =
RecordType(ANY o_custkey, ANY o_orderstatus, ANY o_orderkey, ANY c_custkey):
rowcount = 15000.0, cumulative cost = {33000.0 rows, 238500.0 cpu, 0.0 io, 0.0
network, 26400.000000000004 memory}, id = 11042
00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=classpath:/tpch/orders.parquet]],
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`o_custkey`, `o_orderstatus`, `o_orderkey`]]])
: rowType = RecordType(ANY o_custkey, ANY o_orderstatus, ANY o_orderkey):
rowcount = 15000.0, cumulative cost = {15000.0 rows, 45000.0 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 11040
00-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1,
usedMetadataFile=false, columns=[`c_custkey`]]]) : rowType = RecordType(ANY
c_custkey): rowcount = 1500.0, cumulative cost = {1500.0 rows, 1500.0 cpu, 0.0
io, 0.0 network, 0.0 memory}, id = 11041
{code}
> Hash Join should not return the join columns when they are not needed
> downstream
> --------------------------------------------------------------------------------
>
> Key: DRILL-6758
> URL: https://issues.apache.org/jira/browse/DRILL-6758
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators, Query Planning &
> Optimization
> Affects Versions: 1.14.0
> Reporter: Boaz Ben-Zvi
> Assignee: Hanumath Rao Maduri
> Priority: Minor
> Fix For: 1.15.0
>
>
> Currently the Hash-Join operator returns all its (both sides) incoming
> columns. In cases where the join columns are not used further downstream,
> this is a waste (allocating vectors, copying each value, etc).
> Suggestion: Have the planner pass this information to the Hash-Join
> operator, to enable skipping the return of these columns.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)