[ 
https://issues.apache.org/jira/browse/HBASE-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133472#comment-15133472
 ] 

Junegunn Choi commented on HBASE-15213:
---------------------------------------

bq. Its not clear how the mvcc can act as an effective region-scoped lock? You 
seem to be more looking at why the friction in mvcc... nice.

It was clear, but it was already so before HBASE-8763. So I couldn't see how 
HBASE-8763 made things worse than before. And I had to run some tests.

bq. You were looking at cs for the whole hbase app? I am interested in how you 
attributed cs to mvcc or was it just general observation?

The whole app. I probably should have broken down the numbers, but I had this 
hypothesis on MVCC in mind and the patch worked as expected, so I kind of 
jumped into the conclusion. FYI, here are the cs numbers roughly measured:

- less than 1K cs with no workload
- 280K cs before HBASE-8763 at (approx) TPS 20000
- 450K cs just after HBASE-8763 at TPS 2000
- 430K cs on branch-1.0 at TPS 3800
- 240K cs on branch-1.0 with this patch at TPS 20000

bq. Can you show me to what you are referring too? Is this before HBASE-15031?

It's still so after HBASE-15031. We call advanceMemstore() after we break out 
of the wait-loop:

https://github.com/apache/hbase/blob/b43442c/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235

then the entry is marked complete:

https://github.com/apache/hbase/blob/b43442c/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L153

bq. Do you mean when you do the remove because we are just getting first entry 
or perhaps you are referring to elsewhere because LinkedList is just returning 
its pointer to first when we loop through.

It's related to the above observation that WriteEntry is marked complete only 
after the wait-loop. The entries in the writeQueue are not marked complete 
until advanceMemstore and thus bulk removal is not possible. I mean the 
removeFirst loop in advanceMemstore is not working as expected; only the first 
entry is marked complete. But this also allows for head-only checking in the 
wait-loop possible, since a waiting handler can always be sure that its entry 
is not removed by another handler.

https://github.com/apache/hbase/blob/b43442c/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L218-L221

With the patch this is no longer true, "w" could have been removed by another 
handler while this handler is waiting for it to come to the head of the queue.

bq. Asking because wondering if we should pull this change to master?

I don't think we should, instead we have HBASE-12751 there.

bq. How were you testing it

- Before HBASE-8763: d6cc2fb
- branch-1.0: b43442 (tip of the branch as of yesterday)

I didn't use IncrementPerformanceTest instead I wrote a simple client program. 
It's written in Clojure with a custom API wrapper, but it's simple and should 
be straightforward.

https://gist.github.com/junegunn/1d53b795b22fe9939dc7

So anyway, here are the fresh numbers of IncrementPerformanceTest I just got. I 
used the same configuration as above, 4-core server and a separate 4-core 
client machine.

- b43442c without fix: 75th=24.01575, 95th=35.66879999999999, 99th=43.03161
- b43442c with fix: 75th=4.87025, 95th=7.16975, 99th=8.726110000000013

> Fix increment performance regression caused by HBASE-8763 on branch-1.0
> -----------------------------------------------------------------------
>
>                 Key: HBASE-15213
>                 URL: https://issues.apache.org/jira/browse/HBASE-15213
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>         Attachments: HBASE-15213.branch-1.0.patch
>
>
> This is an attempt to fix the increment performance regression caused by 
> HBASE-8763 on branch-1.0.
> I'm aware that hbase.increment.fast.but.narrow.consistency was added to 
> branch-1.0 (HBASE-15031) to address the issue and a separate work is ongoing 
> on master branch, but anyway, this is my take on the problem.
> I read through HBASE-14460 and HBASE-8763 but it wasn't clear to me what 
> caused the slowdown but I could indeed reproduce the performance regression.
> Test setup:
> - Server: 4-core Xeon 2.4GHz Linux server running mini cluster (100 handlers, 
> JDK 1.7)
> - Client: Another box of the same spec
> - Increments on random 10k records on a single-region table, recreated every 
> time
> Increment throughput (TPS):
> || Num threads || Before HBASE-8763 (d6cc2fb) || branch-1.0 || branch-1.0 
> (narrow-consistency) ||
> || 1            | 2661                         | 2486        | 2359  |
> || 2            | 5048                         | 5064        | 4867  |
> || 4            | 7503                         | 8071        | 8690  |
> || 8            | 10471                        | 10886       | 13980 |
> || 16           | 15515                        | 9418        | 18601 |
> || 32           | 17699                        | 5421        | 20540 |
> || 64           | 20601                        | 4038        | 25591 |
> || 96           | 19177                        | 3891        | 26017 |
> We can clearly observe that the throughtput degrades as we increase the 
> number of concurrent requests, which led me to believe that there's severe 
> context switching overhead and I could indirectly confirm that suspicion with 
> cs entry in vmstat output. branch-1.0 shows a much higher number of context 
> switches even with much lower throughput.
> Here are the observations:
> - WriteEntry in the writeQueue can only be removed by the very handler that 
> put it, only when it is at the front of the queue and marked complete.
> - Since a WriteEntry is marked complete after the wait-loop, only one entry 
> can be removed at a time.
> - This stringent condition causes O(N^2) context switches where n is the 
> number of concurrent handlers processing requests.
> So what I tried here is to mark WriteEntry complete before we go into 
> wait-loop. With the change, multiple WriteEntries can be shifted at a time 
> without context switches. I changed writeQueue to LinkedHashSet since fast 
> containment check is needed as WriteEntry can be removed by any handler.
> The numbers look good, it's virtually identical to pre-HBASE-8763 era.
> || Num threads || branch-1.0 with fix ||
> || 1            | 2459                 |
> || 2            | 4976                 |
> || 4            | 8033                 |
> || 8            | 12292                |
> || 16           | 15234                |
> || 32           | 16601                |
> || 64           | 19994                |
> || 96           | 20052                |
> So what do you think about it? Please let me know if I'm missing anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to