[ 
https://issues.apache.org/jira/browse/PHOENIX-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330791#comment-15330791
 ] 

James Taylor commented on PHOENIX-2982:
---------------------------------------

Sorry for the delayed response, [~junegunn]. We're trying to get our 4.8 
release out, so I've been tied up with that. This indeed looks very interesting 
and promising. Are you in need of this kind of balancing due to usage our our 
Phoenix Query Server? We currently round robin each query, but from your tests 
it looks like this isn't ideal for salted tables. Local indexes are very 
similar to salted tables - any thoughts around using the same technique for 
those?

Another area we've explored is doing the same round robining on the HBase 
cluster. We explored that in HBASE-12790 (see 
https://reviews.apache.org/r/32447/). In reality, I realized later that we 
actually don't need any HBase changes to implement this since we can setup our 
own RpcSchedulerFactory and RpcScheduler where we could do this ourselves (we 
do that today, but for a different purpose). I'm curious if you've thought 
about this angle at all?

Once 4.8 is out, I'd like to revisit this and give you more of an in depth 
review. [~samarthjain] is likely interested as well.

> Keep the number of active handlers balanced across salt buckets
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2982
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2982
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>         Attachments: PHOENIX-2982-v2.patch, PHOENIX-2982.patch, 
> cpu-util-with-patch.png, cpu-util-without-patch.png
>
>
> I'd like to discuss the idea of keeping the numbers of active handlers 
> balanced across the salt buckets during parallel scan by exposing the 
> counters to JobManager queue.
> h4. Background
> I was testing Phoenix on a 10-node test cluster. When I was running a few 
> full-scan queries simultaneously on a table whose {{SALT_BUCKETS}} is 100, I 
> noticed small queries such as {{SELECT * FROM T LIMIT 100}} are occasionally 
> blocked for up to tens of seconds due to the exhaustion of regionserver 
> handler threads.
> {{hbase.regionserver.handler.count}} was set to 100, which is much larger 
> than the default 30, so I didn't expect this to happen 
> ({{phoenix.query.threadPoolSize}} = 128 / 10 nodes * 4 queries =~ 50 < 100), 
> but from periodic thread dumps, I could observe that the numbers of active 
> handlers across the regionservers are often skewed during the execution.
> {noformat}
> # Obtained by periodic thread dumps
> # (key = regionserver ID, value = number of active handlers)
> 17:23:48: {1=>6,  2=>3,  3=>27, 4=>12, 5=>13, 6=>23, 7=>23, 8=>5,  9=>5,  
> 10=>10}
> 17:24:18: {1=>8,  2=>6,  3=>26, 4=>3,  5=>13, 6=>41, 7=>11, 8=>5,  9=>8,  
> 10=>5}
> 17:24:48: {1=>15, 3=>30, 4=>3,  5=>8,  6=>22, 7=>11, 8=>16, 9=>16, 10=>7}
> 17:25:18: {1=>6,  2=>12, 3=>37, 4=>6,  5=>4,  6=>2,  7=>21, 8=>10, 9=>24, 
> 10=>5}
> 17:25:48: {1=>4,  2=>9,  3=>48, 4=>14, 5=>2,  6=>7,  7=>18, 8=>16, 9=>2,  
> 10=>8}
> {noformat}
> Although {{ParallelIterators.submitWork}} shuffles the parallel scan tasks 
> before submitting them to {{ThreadPoolExecutor}}, there's currently no 
> mechanism to prevent the skew from happening during runtime.
> h4. Suggestion
> Maintain "active" counter for each salt bucket, and expose the numbers to 
> JobManager queue via specialized {{Producer}} implementation so that it can 
> choose a scan for the least loaded bucket.
> By doing so we can prevent the handler exhaustion problem described above and 
> can expect more consistent utilization of the resources.
> h4. Evaluation
> I measured the response times of a full-scan query with and without the 
> patch. The plan for the query is given as follows:
> {noformat}
> > explain select count(*) from image;
> +---------------------------------------------------------------------------------------------+
> |                                            PLAN                             
>                 |
> +---------------------------------------------------------------------------------------------+
> | CLIENT 2700-CHUNK 1395337221 ROWS 817889379319 BYTES PARALLEL 100-WAY FULL 
> SCAN OVER IMAGE  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>                 |
> |     SERVER AGGREGATE INTO SINGLE ROW                                        
>                 |
> +---------------------------------------------------------------------------------------------+
> {noformat}
> Result:
> - Without patch: 260 sec
> - With patch: 220 sec
> And with the patch, CPU utilizations of the regionservers are very stable 
> during the run.
> I'd like to hear what you think about the approach. Please let me know if 
> there's any concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to