[ 
https://issues.apache.org/jira/browse/PHOENIX-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013318#comment-14013318
 ] 

jay wong edited comment on PHOENIX-1006 at 5/30/14 5:46 AM:
------------------------------------------------------------


query with the same table. the sql in my issue describetion

[~jamestaylor], "parallelly order" is your parallelly order 


|Test scene| RT |
|no modify | 45s|
|my patch| 3s|
| parallelly order|21s|


was (Author: jaywong):
|Test scene| RT |
|no modify | 45s|
|my patch| 3s|
| parallelly order||

> 8x Performance enhancements in my group by query case.
> ------------------------------------------------------
>
>                 Key: PHOENIX-1006
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1006
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: jay wong
>         Attachments: PHOENIX-1006.patch, PHOENIX-1006v2.patch
>
>
> Assuming a sql query like below which will generate 55000 groups:
> {code}
> SELECT count(1) as count,SUM(int_column) as sum_column, MAX(int_column) as 
> max_column2,MIN(int_column) as min_column,AVG(int_column) as avg_column FROM 
> table1 WHERE int_column IS NOT NULL GROUP BY int_column2 ORDER BY int_column 
> DESC LIMIT 200;
> {code}
> From AgreegatePlan we could see the *resultIterator* will be set to 
> MergeSortRowKeyResultIterator during group by, and the 
> MergeSortRowKeyResultIterator needs an OrderedResultIterator. As a result, no 
> matter whether the _group by_ query is with _order by_ or not, it'll ALWAYS 
> be sorted first, which is unnecessary.
> To improve this, we could modify the code to not trigger orderby iterator 
> when groupby w/o orderby, and sort the result within each group on client 
> side instead.
> On the other side, in the groupby plus orderby case, now the sort on 
> RegionServer side is triggered sequentially, which cause s poor performance 
> especially w/ big region number. We should improve this by getting an element 
> from each scanner earlier to trigger the sort  and make the sorting in 
> parallel.
> More details, please refer to the attached patch. Any comment/suggestion will 
> be highly appreciated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to