[jira] [Comment Edited] (PHOENIX-3131) improve "order by " performance with aggregated query

Ankit Singhal (JIRA) Tue, 02 Aug 2016 04:27:48 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403791#comment-15403791
 ]


Ankit Singhal edited comment on PHOENIX-3131 at 8/2/16 11:27 AM:
-----------------------------------------------------------------

[~tuyuri], Yeah,80k should not be much(as you have 4 regions only, so maximum 
you can get 80k*4 for merge sort too).
So you might be spending much time on server, can you confirm the time taken by 
below query(without limit) , so that we can confirm how much time it take to 
read all and complete region.(And then we can add sorting time(0.3sec *4) to 
this to come to the time of your slow query)
{code}
select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
('2016-06-01') group by url
{code}

Because,  when query is run with limit and group by on first primary key, we do 
optimization that we read first region(if salt is not used) and all region(if 
salt is used) with number of records equal to limit.
{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS
{code}

against, complete region when limit is not specified

{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL]
{code}




was (Author: [email protected]):
[~tuyuri], Yeah,80k should not be much(as you have 4 regions only, so maximum 
you can get 80k*4 for merge sort too).
So you might be spending much time on server, can you confirm the time taken by 
below query(without limit) , so that we can confirm how much time it take to 
read all and complete region.(And then we can add sorting time(0.3sec *4) to 
this to come to the time of your slow query)
{code}
select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
('2016-06-01') group by url
{code}

Because,  when query is run with limit and group by on first primary key, we do 
optimization that we read first region(if salt is not used) and all region(if 
salt is used) with number of records equal to limit.
{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS
{code}

against, complete region when limit is not specified

{code}
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS
{code}



> improve "order by " performance with aggregated query 
> ------------------------------------------------------
>
>                 Key: PHOENIX-3131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3131
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: tu nguyen khac
>            Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node 
> ) 
> CREATE TABLE pageview_site (
>     url varchar(255) not null,
>     pageview bigint,
>     dt date not null,
>     CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that : 
> 1. I tried to upsert about : 13 milions rows to this table . 
> 2. Run 2 queries : 
>     a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
> ('2016-06-01') group by url limit 100 offset 2;
> the duration this query  in about : 0.5 second
>     b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date 
> ('2016-06-01') group by ur order by pv descl limit 100 offset 2;
> the duration this query  in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for 
> "order by " command 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3131) improve "order by " performance with aggregated query

Reply via email to