[
https://issues.apache.org/jira/browse/KYLIN-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
hongbin ma resolved KYLIN-2501.
-------------------------------
Resolution: Fixed
Fix Version/s: v2.0.0
> Stream Aggregate GTRecords at Query Server
> ------------------------------------------
>
> Key: KYLIN-2501
> URL: https://issues.apache.org/jira/browse/KYLIN-2501
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine, Storage - HBase
> Affects Versions: v1.6.0
> Reporter: Dayue Gao
> Assignee: Dayue Gao
> Fix For: v2.0.0
>
> Attachments: KYLIN-2501.patch
>
>
> *Problem*
> When query server needs to handle millions of records, CubeTupleConverter
> could become performance bottleneck.
> An experiment shows that converting 5 millions records takes ~11s, which
> accounts for 50% of the total query time.
> *Motivation*
> Records returned from each storage partition is guaranteed to be ordered.
> Therefore we could reduce the number of records passed to CubeTupleConverter
> by
> # merge sorted records from all partitions, similar to what we have done in
> KYLIN-1787
> # use a [stream
> aggregate|https://blogs.msdn.microsoft.com/craigfr/2006/09/13/stream-aggregate/]
> algorithm on merged stream to aggregate those records with the same key
> *Proposal*
> # Add a new physical operator GTStreamAggregateScanner which implements the
> stream aggregate algorithm
> # Refine SortedIteratorMergerWithLimit that was used to merge sort records
> from different partitions. The previous implementation has performance issues
> (KYLIN-2483) due to expensive record clone
> # Leverage GTStreamAggregateScanner to aggregate records on merged stream
> *Scope*
> Stream aggregate has some good properties such as low memory usage and
> streamable ordered outputs, making it better than hash/sort based
> alternatives when input is already sorted. So I bet the new
> GTStreamAggregateScanner operator can also be used to accelerate cubing and
> coprocessor aggregation in certain cases. I'll focus on query server in this
> jira and leave those optimizations as future works.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)