[
https://issues.apache.org/jira/browse/DRILL-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pritesh Maker reassigned DRILL-5912:
------------------------------------
Assignee: Boaz Ben-Zvi
> Hash Join Enhancement: Avoid copying probe side values
> ------------------------------------------------------
>
> Key: DRILL-5912
> URL: https://issues.apache.org/jira/browse/DRILL-5912
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.11.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
> Priority: Minor
>
> When the Hash Join Operator (inner, or left outer) performs the "probe and
> project" task, it copies each probe side values to be projected. Example:
> {code}
> public void projectProbeRecord(int probeIndex, int outIndex)
> throws SchemaChangeException
> {
> {
> vv15 .copyFromSafe((probeIndex), (outIndex), vv12);
> }
> {
> vv21 .copyFromSafe((probeIndex), (outIndex), vv18);
> }
> }
> {code}
> In the case where there are no duplicate-key entries in the build side, and
> no spilling took place, then each of the outer values is projected exactly
> once (for left outer), or at most once (for inner join).
> In such (common) cases, we could avoid the above copy, and just transfer the
> value vectors as is (or add a Selection Vector 2 for the inner join, to
> eliminate the unmatched entries).
> This can be a significant performance enhancement, as copying each set of
> values is much more expensive than transposing vectors (e.g., perform the
> copy 64K times, plus allocation of the vectors, and possible resizing for
> variable sized types).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)