Boaz Ben-Zvi created DRILL-6400:
-----------------------------------
Summary: Hash-Aggr: Avoid recreating common Hash-Table setups for
every partition
Key: DRILL-6400
URL: https://issues.apache.org/jira/browse/DRILL-6400
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
Fix For: 1.14.0
The current Hash-Aggr code (and soon the Hash-Join code) creates multiple
partitions to hold the incoming data; each partition with its own HashTable.
The current code invokes the HashTable method _createAndSetupHashTable()_
for *each* partition. But most of the setups done by this method are identical
for all the partitions (e.g., code generation). Calling this method has a
performance cost (some local tests measured between 3 - 30 milliseconds,
depends on the key columns).
Suggested performance improvement: Extract the common settings to be called
*once*, and use the results later by all the partitions. When running with the
default 32 partitions, this can have a measurable improvement (and if spilling,
this method is used again....).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)