[ https://issues.apache.org/jira/browse/DRILL-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker reassigned DRILL-5912: ------------------------------------ Assignee: Boaz Ben-Zvi > Hash Join Enhancement: Avoid copying probe side values > ------------------------------------------------------ > > Key: DRILL-5912 > URL: https://issues.apache.org/jira/browse/DRILL-5912 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators > Affects Versions: 1.11.0 > Reporter: Boaz Ben-Zvi > Assignee: Boaz Ben-Zvi > Priority: Minor > > When the Hash Join Operator (inner, or left outer) performs the "probe and > project" task, it copies each probe side values to be projected. Example: > {code} > public void projectProbeRecord(int probeIndex, int outIndex) > throws SchemaChangeException > { > { > vv15 .copyFromSafe((probeIndex), (outIndex), vv12); > } > { > vv21 .copyFromSafe((probeIndex), (outIndex), vv18); > } > } > {code} > In the case where there are no duplicate-key entries in the build side, and > no spilling took place, then each of the outer values is projected exactly > once (for left outer), or at most once (for inner join). > In such (common) cases, we could avoid the above copy, and just transfer the > value vectors as is (or add a Selection Vector 2 for the inner join, to > eliminate the unmatched entries). > This can be a significant performance enhancement, as copying each set of > values is much more expensive than transposing vectors (e.g., perform the > copy 64K times, plus allocation of the vectors, and possible resizing for > variable sized types). -- This message was sent by Atlassian JIRA (v6.4.14#64029)