[ 
https://issues.apache.org/jira/browse/DRILL-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6400:
--------------------------------
    Fix Version/s: Future

> Hash-Aggr: Avoid recreating common Hash-Table setups for every partition
> ------------------------------------------------------------------------
>
>                 Key: DRILL-6400
>                 URL: https://issues.apache.org/jira/browse/DRILL-6400
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.13.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Minor
>             Fix For: Future
>
>
>  The current Hash-Aggr code (and soon the Hash-Join code) creates multiple 
> partitions to hold the incoming data; each partition with its own HashTable. 
>      The current code invokes the HashTable method 
> _createAndSetupHashTable()_ for *each* partition. But most of the setups done 
> by this method are identical for all the partitions (e.g., code generation).  
> Calling this method has a performance cost (some local tests measured between 
> 3 - 30 milliseconds, depends on the key columns).
>   Suggested performance improvement: Extract the common settings to be called 
> *once*, and use the results later by all the partitions. When running with 
> the default 32 partitions, this can have a measurable improvement (and if 
> spilling, this method is used again....).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to