Boaz Ben-Zvi created DRILL-6444:
-----------------------------------
Summary: Hash Join: Avoid partitioning when memory is sufficient
Key: DRILL-6444
URL: https://issues.apache.org/jira/browse/DRILL-6444
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Relational Operators
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
The Hash Join Spilling feature introduced partitioning (of the incoming build
side) which adds some overhead (copying the incoming data, row by row). That
happens even when no spilling is needed.
Suggested optimization: Try reading the incoming build data without
partitioning, while checking that enough memory is available. In case the whole
build side (plus hash table) fits in memory - then continue like a "single
partition". In case not, then need to partition the data read so far and
continue as usual (with partitions).
(See optimization 8.1 in the Hash Join Spill design document:
[https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit]
)
This is currently implemented only for the case of num_partitions = 1 (i.e, no
spilling, and no memory checking).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)