Junegunn Choi created PHOENIX-2982:
--------------------------------------

             Summary: Keep the number of active handlers balanced across salt 
buckets
                 Key: PHOENIX-2982
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2982
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Junegunn Choi
            Assignee: Junegunn Choi


I'd like to discuss the idea of keeping the numbers of active handlers balanced 
across the salt buckets during parallel scan by exposing the counters to 
JobManager queue.

h4. Background

I was testing Phoenix on a 10-node test cluster. When I was running a few 
full-scan queries simultaneously on a table whose {{SALT_BUCKETS}} is 100, I 
noticed small queries such as {{SELECT * FROM T LIMIT 100}} are occasionally 
blocked for up to tens of seconds due to the exhaustion of regionserver handler 
threads.

{{hbase.regionserver.handler.count}} was set to 100, which is much larger than 
the default 30, so I didn't expect this to happen 
({{phoenix.query.threadPoolSize}} = 128 / 10 nodes * 4 queries =~ 50 < 100), 
but from periodic thread dumps, I could observe that the numbers of active 
handlers across the regionservers are often skewed during the execution.

{noformat}
# Obtained by periodic thread dumps
# (key = regionserver ID, value = number of active handlers)
17:23:48: {1=>6,  2=>3,  3=>27, 4=>12, 5=>13, 6=>23, 7=>23, 8=>5,  9=>5,  
10=>10}
17:24:18: {1=>8,  2=>6,  3=>26, 4=>3,  5=>13, 6=>41, 7=>11, 8=>5,  9=>8,  10=>5}
17:24:48: {1=>15, 3=>30, 4=>3,  5=>8,  6=>22, 7=>11, 8=>16, 9=>16, 10=>7}
17:25:18: {1=>6,  2=>12, 3=>37, 4=>6,  5=>4,  6=>2,  7=>21, 8=>10, 9=>24, 10=>5}
17:25:48: {1=>4,  2=>9,  3=>48, 4=>14, 5=>2,  6=>7,  7=>18, 8=>16, 9=>2,  10=>8}
{noformat}

Although {{ParallelIterators.submitWork}} shuffles the parallel scan tasks 
before submitting them to {{ThreadPoolExecutor}}, there's currently no 
mechanism to prevent the skew from happening during runtime.

h4. Suggestion

Maintain "active" counter for each salt bucket, and expose the numbers to 
JobManager queue via specialized {{Producer}} implementation so that it can 
choose a scan for the least loaded bucket.

By doing so we can prevent the handler exhaustion problem described above and 
can expect more consistent utilization of the resources.

h4. Evaluation

I measured the response times of a full-scan query with and without the patch. 
The plan for the query is given as follows:

{noformat}
> explain select count(*) from image;
+---------------------------------------------------------------------------------------------+
|                                            PLAN                               
              |
+---------------------------------------------------------------------------------------------+
| CLIENT 2700-CHUNK 1395337221 ROWS 817889379319 BYTES PARALLEL 100-WAY FULL 
SCAN OVER IMAGE  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
              |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
              |
+---------------------------------------------------------------------------------------------+
{noformat}

Result:

- Without patch: 260 sec
- With patch: 220 sec

And with the patch, CPU utilizations of the regionservers are very stable 
during the run.

I'd like to hear what you think about the approach. Please let me know if 
there's any concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to