[ 
https://issues.apache.org/jira/browse/PHOENIX-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324058#comment-15324058
 ] 

Hadoop QA commented on PHOENIX-2982:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12809388/PHOENIX-2982-v2.patch
  against master branch at commit e7e8e130b4ec84b32ed507201c1a00939f1d3b7c.
  ATTACHMENT ID: 12809388

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
31 warning messages.

    {color:red}-1 release audit{color}.  The applied patch generated 7 release 
audit warnings (more than the master's current 0 warnings).

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.phoenix.compile.QueryCompilerTest

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/393//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/393//artifact/patchprocess/patchReleaseAuditWarnings.txt
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/393//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/393//console

This message is automatically generated.

> Keep the number of active handlers balanced across salt buckets
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2982
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2982
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>         Attachments: PHOENIX-2982-v2.patch, PHOENIX-2982.patch, 
> cpu-util-with-patch.png, cpu-util-without-patch.png
>
>
> I'd like to discuss the idea of keeping the numbers of active handlers 
> balanced across the salt buckets during parallel scan by exposing the 
> counters to JobManager queue.
> h4. Background
> I was testing Phoenix on a 10-node test cluster. When I was running a few 
> full-scan queries simultaneously on a table whose {{SALT_BUCKETS}} is 100, I 
> noticed small queries such as {{SELECT * FROM T LIMIT 100}} are occasionally 
> blocked for up to tens of seconds due to the exhaustion of regionserver 
> handler threads.
> {{hbase.regionserver.handler.count}} was set to 100, which is much larger 
> than the default 30, so I didn't expect this to happen 
> ({{phoenix.query.threadPoolSize}} = 128 / 10 nodes * 4 queries =~ 50 < 100), 
> but from periodic thread dumps, I could observe that the numbers of active 
> handlers across the regionservers are often skewed during the execution.
> {noformat}
> # Obtained by periodic thread dumps
> # (key = regionserver ID, value = number of active handlers)
> 17:23:48: {1=>6,  2=>3,  3=>27, 4=>12, 5=>13, 6=>23, 7=>23, 8=>5,  9=>5,  
> 10=>10}
> 17:24:18: {1=>8,  2=>6,  3=>26, 4=>3,  5=>13, 6=>41, 7=>11, 8=>5,  9=>8,  
> 10=>5}
> 17:24:48: {1=>15, 3=>30, 4=>3,  5=>8,  6=>22, 7=>11, 8=>16, 9=>16, 10=>7}
> 17:25:18: {1=>6,  2=>12, 3=>37, 4=>6,  5=>4,  6=>2,  7=>21, 8=>10, 9=>24, 
> 10=>5}
> 17:25:48: {1=>4,  2=>9,  3=>48, 4=>14, 5=>2,  6=>7,  7=>18, 8=>16, 9=>2,  
> 10=>8}
> {noformat}
> Although {{ParallelIterators.submitWork}} shuffles the parallel scan tasks 
> before submitting them to {{ThreadPoolExecutor}}, there's currently no 
> mechanism to prevent the skew from happening during runtime.
> h4. Suggestion
> Maintain "active" counter for each salt bucket, and expose the numbers to 
> JobManager queue via specialized {{Producer}} implementation so that it can 
> choose a scan for the least loaded bucket.
> By doing so we can prevent the handler exhaustion problem described above and 
> can expect more consistent utilization of the resources.
> h4. Evaluation
> I measured the response times of a full-scan query with and without the 
> patch. The plan for the query is given as follows:
> {noformat}
> > explain select count(*) from image;
> +---------------------------------------------------------------------------------------------+
> |                                            PLAN                             
>                 |
> +---------------------------------------------------------------------------------------------+
> | CLIENT 2700-CHUNK 1395337221 ROWS 817889379319 BYTES PARALLEL 100-WAY FULL 
> SCAN OVER IMAGE  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>                 |
> |     SERVER AGGREGATE INTO SINGLE ROW                                        
>                 |
> +---------------------------------------------------------------------------------------------+
> {noformat}
> Result:
> - Without patch: 260 sec
> - With patch: 220 sec
> And with the patch, CPU utilizations of the regionservers are very stable 
> during the run.
> I'd like to hear what you think about the approach. Please let me know if 
> there's any concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to