[jira] [Commented] (HBASE-15971) Regression: Random Read/WorkloadC slower in 1.x than 0.98

stack (JIRA) Mon, 13 Jun 2016 15:14:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328415#comment-15328415
 ]


stack commented on HBASE-15971:
-------------------------------

Reverting HBASE-10993 gets the branch-1 throughput almost the same as 0.98. So, 
the difference is our rpc scheduler doing FIFO by default (0.98) or 
sort-by-priority (branch-1). This change is all that is needed to make branch-1 
same as 0.98:

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
index d9d61c1..88258f7 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
@@ -180,8 +180,7 @@ public class SimpleRpcScheduler extends RpcScheduler 
implements ConfigurationObs
     this.highPriorityLevel = highPriorityLevel;
     this.abortable = server;

-    String callQueueType = conf.get(CALL_QUEUE_TYPE_CONF_KEY,
-      CALL_QUEUE_TYPE_DEADLINE_CONF_VALUE);
+    String callQueueType = conf.get(CALL_QUEUE_TYPE_CONF_KEY, "FIFO");
     float callqReadShare = conf.getFloat(CALL_QUEUE_READ_SHARE_CONF_KEY, 0);
     float callqScanShare = conf.getFloat(CALL_QUEUE_SCAN_SHARE_CONF_KEY, 0);
{code}

There is no 'FIFO' callQueueType... FIFO is just the default if no other config 
gets in the way.

HBASE-10993 added the 'deadline' rpc queue type. It also made it the default. 
This queue type will sort the requests by 'priority' but in the current 
implementation -- see AnnotationReadingPriorityFunction#getPriority -- only 
long-running scans are effected. When their 'next' comes in, they are 
deprioritized a configurable amount (see 
http://blog.cloudera.com/blog/2014/12/new-in-cdh-5-2-improvements-for-running-multiple-workloads-on-a-single-hbase-cluster/
 for a nice exposition). HBASE-10993 changed our scheduler from FIFO to be 
smarter but the cost turns out to be high.

Chatting w/ [~mbertozzi], the sort-on-priority is of minor benefit and suggests 
that throttling would be a more effective means of limiting users or use of a 
table (HBASE-11598). So, changing our default to go back to FIFO in branch-1.3 
seems the thing to do ([~mantonov] Ok by you?). I can make a patch to do this 
and then add on the fastpath patch added above and then branch-1 should go 
faster than 0.98. I'd also work on moving stuff out of SimpleRpcExecutor 
Currently it is overloaded with options. Instead I'd have folks enable the 
executor type they are interested in. Let SimpleRpcExecutor be explicitly FIFO.

> Regression: Random Read/WorkloadC slower in 1.x than 0.98
> ---------------------------------------------------------
>
>                 Key: HBASE-15971
>                 URL: https://issues.apache.org/jira/browse/HBASE-15971
>             Project: HBase
>          Issue Type: Sub-task
>          Components: rpc
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>         Attachments: 098.hits.png, 098.png, HBASE-15971.branch-1.001.patch, 
> Screen Shot 2016-06-10 at 5.08.24 PM.png, Screen Shot 2016-06-10 at 5.08.26 
> PM.png, branch-1.hits.png, branch-1.png, 
> flight_recording_10172402220203_28.branch-1.jfr, 
> flight_recording_10172402220203_29.09820.0.98.20.jfr, handlers.fp.png, 
> hits.fp.png, hits.patched1.0.vs.unpatched1.0.vs.098.png, run_ycsb.sh
>
>
> branch-1 is slower than 0.98 doing YCSB random read/workloadC. It seems to be 
> doing about 1/2 the throughput of 0.98.
> In branch-1, we have low handler occupancy compared to 0.98. Hacking in 
> reader thread occupancy metric, is about the same in both. In parent issue, 
> hacking out the scheduler, I am able to get branch-1 to go 3x faster so will 
> dig in here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15971) Regression: Random Read/WorkloadC slower in 1.x than 0.98

Reply via email to