[jira] [Commented] (PHOENIX-3023) Slow performance when limit queries are executed in parallel by default

Samarth Jain (JIRA) Thu, 23 Jun 2016 14:11:33 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347191#comment-15347191
 ]


Samarth Jain commented on PHOENIX-3023:
---------------------------------------

I think I understand what is going on here. Earlier, even though we said that 
the query is being executed serially, we used to create scanners for each 
region in parallel. That was fixed as part of my modified implementation of 
SerialIterators in 
https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=54362430d71be788d515944573572624628a09b6;hp=559dfa76bcd622701d789a0f20bd11da11da7412.
 

The reason we are seeing regression now is because with ParallelIterators, we 
are pre-creating all the scanners even though we don't need to (because the 
results that we are interested in could fit in one guide post/scan per region). 
We need to use ParallelIterators because we need to be able to do a merge sort. 
One way to fix this is to realize that even though results could fit in a chunk 
smaller than one guidepost, we still need to a merge sort (because table is 
salted etc). In such a scenario, we will create scanners in parallel for only 
the first guide-post per region. Will upload a patch.

> Slow performance when limit queries are executed in parallel by default
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-3023
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3023
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>
> After 
> [this|https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=54362430d71be788d515944573572624628a09b6]
>  commit, limit queries are executed in parallel which causes performance to 
> be ~5-10x slower. Providing a serial hint fixes it though.
> After commit:
> {code}
> select * from WIDE_PK order by mypk DESC limit 1; // this takes ~400ms
> CLIENT 1280-CHUNK 1996304 ROWS 6380181208 BYTES PARALLEL 4-WAY REVERSE FULL 
> SCAN OVER WIDE_PK SERVER 1 ROW LIMIT CLIENT MERGE SORT CLIENT 1 ROW LIMIT
> {code}
> Before commit:
> {code}
> select * from WIDE_PK order by mypk DESC limit 1; // this takes ~40ms
> CLIENT 1280-CHUNK 1996304 ROWS 6380181208 BYTES SERIAL 4-WAY REVERSE FULL 
> SCAN OVER WIDE_PK SERVER 1 ROW LIMIT CLIENT MERGE SORT CLIENT 1 ROW LIMIT
> {code}
> Test was done on a single node machine running HBase 0.98.17.  DDL used was 
> {code}CREATE TABLE WIDE_PK (MYPK CHAR(500) NOT NULL PRIMARY KEY,CF.column1 
> INTEGER,CF.column2 INTEGER,CF.column3 INTEGER,CF.column4 INTEGER,CF.column5 
> INTEGER) SALT_BUCKETS=4 with phoenix.stats.guidepost.width of 5000000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3023) Slow performance when limit queries are executed in parallel by default

Reply via email to