[
https://issues.apache.org/jira/browse/HBASE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328415#comment-15328415
]
stack commented on HBASE-15971:
-------------------------------
Reverting HBASE-10993 gets the branch-1 throughput almost the same as 0.98. So,
the difference is our rpc scheduler doing FIFO by default (0.98) or
sort-by-priority (branch-1). This change is all that is needed to make branch-1
same as 0.98:
{code}
diff --git
a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
index d9d61c1..88258f7 100644
---
a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
+++
b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
@@ -180,8 +180,7 @@ public class SimpleRpcScheduler extends RpcScheduler
implements ConfigurationObs
this.highPriorityLevel = highPriorityLevel;
this.abortable = server;
- String callQueueType = conf.get(CALL_QUEUE_TYPE_CONF_KEY,
- CALL_QUEUE_TYPE_DEADLINE_CONF_VALUE);
+ String callQueueType = conf.get(CALL_QUEUE_TYPE_CONF_KEY, "FIFO");
float callqReadShare = conf.getFloat(CALL_QUEUE_READ_SHARE_CONF_KEY, 0);
float callqScanShare = conf.getFloat(CALL_QUEUE_SCAN_SHARE_CONF_KEY, 0);
{code}
There is no 'FIFO' callQueueType... FIFO is just the default if no other config
gets in the way.
HBASE-10993 added the 'deadline' rpc queue type. It also made it the default.
This queue type will sort the requests by 'priority' but in the current
implementation -- see AnnotationReadingPriorityFunction#getPriority -- only
long-running scans are effected. When their 'next' comes in, they are
deprioritized a configurable amount (see
http://blog.cloudera.com/blog/2014/12/new-in-cdh-5-2-improvements-for-running-multiple-workloads-on-a-single-hbase-cluster/
for a nice exposition). HBASE-10993 changed our scheduler from FIFO to be
smarter but the cost turns out to be high.
Chatting w/ [~mbertozzi], the sort-on-priority is of minor benefit and suggests
that throttling would be a more effective means of limiting users or use of a
table (HBASE-11598). So, changing our default to go back to FIFO in branch-1.3
seems the thing to do ([~mantonov] Ok by you?). I can make a patch to do this
and then add on the fastpath patch added above and then branch-1 should go
faster than 0.98. I'd also work on moving stuff out of SimpleRpcExecutor
Currently it is overloaded with options. Instead I'd have folks enable the
executor type they are interested in. Let SimpleRpcExecutor be explicitly FIFO.
> Regression: Random Read/WorkloadC slower in 1.x than 0.98
> ---------------------------------------------------------
>
> Key: HBASE-15971
> URL: https://issues.apache.org/jira/browse/HBASE-15971
> Project: HBase
> Issue Type: Sub-task
> Components: rpc
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Attachments: 098.hits.png, 098.png, HBASE-15971.branch-1.001.patch,
> Screen Shot 2016-06-10 at 5.08.24 PM.png, Screen Shot 2016-06-10 at 5.08.26
> PM.png, branch-1.hits.png, branch-1.png,
> flight_recording_10172402220203_28.branch-1.jfr,
> flight_recording_10172402220203_29.09820.0.98.20.jfr, handlers.fp.png,
> hits.fp.png, hits.patched1.0.vs.unpatched1.0.vs.098.png, run_ycsb.sh
>
>
> branch-1 is slower than 0.98 doing YCSB random read/workloadC. It seems to be
> doing about 1/2 the throughput of 0.98.
> In branch-1, we have low handler occupancy compared to 0.98. Hacking in
> reader thread occupancy metric, is about the same in both. In parent issue,
> hacking out the scheduler, I am able to get branch-1 to go 3x faster so will
> dig in here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)