Junegunn Choi created PHOENIX-2982:
--------------------------------------
Summary: Keep the number of active handlers balanced across salt
buckets
Key: PHOENIX-2982
URL: https://issues.apache.org/jira/browse/PHOENIX-2982
Project: Phoenix
Issue Type: Improvement
Reporter: Junegunn Choi
Assignee: Junegunn Choi
I'd like to discuss the idea of keeping the numbers of active handlers balanced
across the salt buckets during parallel scan by exposing the counters to
JobManager queue.
h4. Background
I was testing Phoenix on a 10-node test cluster. When I was running a few
full-scan queries simultaneously on a table whose {{SALT_BUCKETS}} is 100, I
noticed small queries such as {{SELECT * FROM T LIMIT 100}} are occasionally
blocked for up to tens of seconds due to the exhaustion of regionserver handler
threads.
{{hbase.regionserver.handler.count}} was set to 100, which is much larger than
the default 30, so I didn't expect this to happen
({{phoenix.query.threadPoolSize}} = 128 / 10 nodes * 4 queries =~ 50 < 100),
but from periodic thread dumps, I could observe that the numbers of active
handlers across the regionservers are often skewed during the execution.
{noformat}
# Obtained by periodic thread dumps
# (key = regionserver ID, value = number of active handlers)
17:23:48: {1=>6, 2=>3, 3=>27, 4=>12, 5=>13, 6=>23, 7=>23, 8=>5, 9=>5,
10=>10}
17:24:18: {1=>8, 2=>6, 3=>26, 4=>3, 5=>13, 6=>41, 7=>11, 8=>5, 9=>8, 10=>5}
17:24:48: {1=>15, 3=>30, 4=>3, 5=>8, 6=>22, 7=>11, 8=>16, 9=>16, 10=>7}
17:25:18: {1=>6, 2=>12, 3=>37, 4=>6, 5=>4, 6=>2, 7=>21, 8=>10, 9=>24, 10=>5}
17:25:48: {1=>4, 2=>9, 3=>48, 4=>14, 5=>2, 6=>7, 7=>18, 8=>16, 9=>2, 10=>8}
{noformat}
Although {{ParallelIterators.submitWork}} shuffles the parallel scan tasks
before submitting them to {{ThreadPoolExecutor}}, there's currently no
mechanism to prevent the skew from happening during runtime.
h4. Suggestion
Maintain "active" counter for each salt bucket, and expose the numbers to
JobManager queue via specialized {{Producer}} implementation so that it can
choose a scan for the least loaded bucket.
By doing so we can prevent the handler exhaustion problem described above and
can expect more consistent utilization of the resources.
h4. Evaluation
I measured the response times of a full-scan query with and without the patch.
The plan for the query is given as follows:
{noformat}
> explain select count(*) from image;
+---------------------------------------------------------------------------------------------+
| PLAN
|
+---------------------------------------------------------------------------------------------+
| CLIENT 2700-CHUNK 1395337221 ROWS 817889379319 BYTES PARALLEL 100-WAY FULL
SCAN OVER IMAGE |
| SERVER FILTER BY FIRST KEY ONLY
|
| SERVER AGGREGATE INTO SINGLE ROW
|
+---------------------------------------------------------------------------------------------+
{noformat}
Result:
- Without patch: 260 sec
- With patch: 220 sec
And with the patch, CPU utilizations of the regionservers are very stable
during the run.
I'd like to hear what you think about the approach. Please let me know if
there's any concerns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)