[ 
https://issues.apache.org/jira/browse/HBASE-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132868#comment-15132868
 ] 

stack commented on HBASE-15213:
-------------------------------

I love your analysis [~junegunn] Thank you. The patch in HBASE-14465 by-passes 
mvcc rather than dig in on why mvcc so slow (it seemed 'obvious' that it was 
mvcc's region-scope with some friction because of i/o caused it to act as a 
region-scoped 'lock' when high concurrency). Your analysis helps clear up a 
mystery I was to dig into as to why master doesn't seem to suffer so much from 
slow increment. Thanks.

bq. ...but it wasn't clear to me what caused the slowdown...

Its not clear how the mvcc can act as an effective region-scoped lock? You seem 
to be more looking at why the friction in mvcc... nice.

bq. ...branch-1.0 shows a much higher number of context switches even with much 
lower throughput....

You were looking at cs for the whole hbase app? I am interested in how you 
attributed cs to mvcc or was it just general observation?

bq. Since a WriteEntry is marked complete after the wait-loop, only one entry 
can be removed at a time.

Can you show me to what you are referring too? Is this before HBASE-15031?

bq. So what I tried here is to mark WriteEntry complete before we go into 
wait-loop. With the change, multiple WriteEntries can be shifted at a time 
without context switches. 

You say master is doing this (it seems so, yes).

bq. I changed writeQueue to LinkedHashSet since fast containment check is 
needed as WriteEntry can be removed by any handler.

We create an iterator each time through. Do you mean when you do the remove 
because we are just getting first entry or perhaps you are referring to 
elsewhere because LinkedList is just returning its pointer to first when we 
loop through. Asking because wondering if we should pull this change to master?

Thanks for taking the time to dig in here [~junegunn] Nice work.









> Fix increment performance regression caused by HBASE-8763 on branch-1.0
> -----------------------------------------------------------------------
>
>                 Key: HBASE-15213
>                 URL: https://issues.apache.org/jira/browse/HBASE-15213
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>         Attachments: HBASE-15213.branch-1.0.patch
>
>
> This is an attempt to fix the increment performance regression caused by 
> HBASE-8763 on branch-1.0.
> I'm aware that hbase.increment.fast.but.narrow.consistency was added to 
> branch-1.0 (HBASE-15031) to address the issue and a separate work is ongoing 
> on master branch, but anyway, this is my take on the problem.
> I read through HBASE-14460 and HBASE-8763 but it wasn't clear to me what 
> caused the slowdown but I could indeed reproduce the performance regression.
> Test setup:
> - Server: 4-core Xeon 2.4GHz Linux server running mini cluster (100 handlers, 
> JDK 1.7)
> - Client: Another box of the same spec
> - Increments on random 10k records on a single-region table, recreated every 
> time
> Increment throughput (TPS):
> || Num threads || Before HBASE-8763 (d6cc2fb) || branch-1.0 || branch-1.0 
> (narrow-consistency) ||
> || 1            | 2661                         | 2486        | 2359  |
> || 2            | 5048                         | 5064        | 4867  |
> || 4            | 7503                         | 8071        | 8690  |
> || 8            | 10471                        | 10886       | 13980 |
> || 16           | 15515                        | 9418        | 18601 |
> || 32           | 17699                        | 5421        | 20540 |
> || 64           | 20601                        | 4038        | 25591 |
> || 96           | 19177                        | 3891        | 26017 |
> We can clearly observe that the throughtput degrades as we increase the 
> number of concurrent requests, which led me to believe that there's severe 
> context switching overhead and I could indirectly confirm that suspicion with 
> cs entry in vmstat output. branch-1.0 shows a much higher number of context 
> switches even with much lower throughput.
> Here are the observations:
> - WriteEntry in the writeQueue can only be removed by the very handler that 
> put it, only when it is at the front of the queue and marked complete.
> - Since a WriteEntry is marked complete after the wait-loop, only one entry 
> can be removed at a time.
> - This stringent condition causes O(N^2) context switches where n is the 
> number of concurrent handlers processing requests.
> So what I tried here is to mark WriteEntry complete before we go into 
> wait-loop. With the change, multiple WriteEntries can be shifted at a time 
> without context switches. I changed writeQueue to LinkedHashSet since fast 
> containment check is needed as WriteEntry can be removed by any handler.
> The numbers look good, it's virtually identical to pre-HBASE-8763 era.
> || Num threads || branch-1.0 with fix ||
> || 1            | 2459                 |
> || 2            | 4976                 |
> || 4            | 8033                 |
> || 8            | 12292                |
> || 16           | 15234                |
> || 32           | 16601                |
> || 64           | 19994                |
> || 96           | 20052                |
> So what do you think about it? Please let me know if I'm missing anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to