[
https://issues.apache.org/jira/browse/PHOENIX-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-1006:
----------------------------------
Assignee: (was: Samarth Jain)
> Do not sort group by rows without order by
> ------------------------------------------
>
> Key: PHOENIX-1006
> URL: https://issues.apache.org/jira/browse/PHOENIX-1006
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: jay wong
> Labels: gsoc2015
> Attachments: PHOENIX-1006.patch, PHOENIX-1006v2.patch
>
>
> Assuming a sql query like below which will generate 55000 groups:
> {code}
> SELECT count(1) as count,SUM(int_column) as sum_column, MAX(int_column) as
> max_column2,MIN(int_column) as min_column,AVG(int_column) as avg_column FROM
> table1 WHERE int_column IS NOT NULL GROUP BY int_column2 ORDER BY int_column
> DESC LIMIT 200;
> {code}
> From AgreegatePlan we could see the *resultIterator* will be set to
> MergeSortRowKeyResultIterator during group by, and the
> MergeSortRowKeyResultIterator needs an OrderedResultIterator. As a result, no
> matter whether the _group by_ query is with _order by_ or not, it'll ALWAYS
> be sorted first, which is unnecessary.
> To improve this, we could modify the code to not trigger orderby iterator
> when groupby w/o orderby, and sort the result within each group on client
> side instead.
> On the other side, in the groupby plus orderby case, now the sort on
> RegionServer side is triggered sequentially, which cause s poor performance
> especially w/ big region number. We should improve this by getting an element
> from each scanner earlier to trigger the sort and make the sorting in
> parallel.
> More details, please refer to the attached patch. Any comment/suggestion will
> be highly appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)