[ https://issues.apache.org/jira/browse/DRILL-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kunal Khatua updated DRILL-6400: -------------------------------- Fix Version/s: Future > Hash-Aggr: Avoid recreating common Hash-Table setups for every partition > ------------------------------------------------------------------------ > > Key: DRILL-6400 > URL: https://issues.apache.org/jira/browse/DRILL-6400 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators > Affects Versions: 1.13.0 > Reporter: Boaz Ben-Zvi > Assignee: Boaz Ben-Zvi > Priority: Minor > Fix For: Future > > > The current Hash-Aggr code (and soon the Hash-Join code) creates multiple > partitions to hold the incoming data; each partition with its own HashTable. > The current code invokes the HashTable method > _createAndSetupHashTable()_ for *each* partition. But most of the setups done > by this method are identical for all the partitions (e.g., code generation). > Calling this method has a performance cost (some local tests measured between > 3 - 30 milliseconds, depends on the key columns). > Suggested performance improvement: Extract the common settings to be called > *once*, and use the results later by all the partitions. When running with > the default 32 partitions, this can have a measurable improvement (and if > spilling, this method is used again....). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)