[jira] [Commented] (PHOENIX-3131) improve "order by " performance with aggregated query

James Taylor (JIRA) Tue, 02 Aug 2016 17:23:31 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405078#comment-15405078
 ]


James Taylor commented on PHOENIX-3131:
---------------------------------------

Those two queries are different, [~tuyuri]. The first one:
{code}
select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
('2016-06-01') group by url limit 100 offset 2
{code}
will push the limit to the server and only look at the first 100 groups it 
finds. 

The second one is more of a topN query, but post aggregation:
{code}
select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
('2016-06-01') group by url order by pv desc limit 100 offset 2
{code}
will not be able to push the limit to the server in the same way because the 
ordering is done on the client (as it orders through a merge sort done on the 
client). So, in other words, it needs to go through all groups in order to find 
the top 100.

You could try adding a secondary index on url, pv (asc or desc). Might need to 
order by url, pv desc. Would have to play around with it a bit.


> improve "order by " performance with aggregated query 
> ------------------------------------------------------
>
>                 Key: PHOENIX-3131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3131
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: tu nguyen khac
>            Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node 
> ) 
> CREATE TABLE pageview_site (
>     url varchar(255) not null,
>     pageview bigint,
>     dt date not null,
>     CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that : 
> 1. I tried to upsert about : 13 milions rows to this table . 
> 2. Run 2 queries : 
>     a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
> ('2016-06-01') group by url limit 100 offset 2;
> the duration this query  in about : 0.5 second
>     b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
> ('2016-06-01') group by ur order by pv descl limit 100 offset 2;
> the duration this query  in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for 
> "order by " command 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3131) improve "order by " performance with aggregated query

Reply via email to