[ https://issues.apache.org/jira/browse/IMPALA-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-2737. ----------------------------------- Resolution: Later Closing until we have more concrete plans. > Investigate partition-oriented agg and join processing > ------------------------------------------------------ > > Key: IMPALA-2737 > URL: https://issues.apache.org/jira/browse/IMPALA-2737 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.3.0 > Reporter: Tim Armstrong > Priority: Minor > Labels: performance > Attachments: partition-oriented-pagg-preview.diff > > > Currently the partitioned aggregations and joins add rows to the partitions > as they process the input. This leads to poor memory access patterns since > the 16 different partitions are randomly accessed. An alternative approach is > to do an initial pass to hash and divide the rows between partitions, then do > a second pass per partition to insert all the rows for that partition. This > avoids the random access to partitions. > This can enable some additional optimisations, e.g. prefetching hash table > buckets for the next row. > An initial prototype was posted here: http://gerrit.cloudera.org/#/c/628 . > The diff is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)