stack commented on HBASE-16698:

Let me answer [~enis] I just went through this issue again and the patch.

 * Our write path has gone through a bunch of change. Some stepped (The Xiaomi 
redo, the intro of ringbuffer). Others evolutions (Reorder because rely on mvcc 
instead of row locks). Its can be hard to keep it all straight. For example, 
[~allan163]'s comment above is against 1.1 but [~carp84] patch is for the next 
version on -- 1.2 but patched back to 1.1. 
* Agree we should pick an approach with fall-back just-in-case. The patch here 
has that. Patch also has the benefit of having been run in production showing 
good numbers.
 * The lock is region-scoped. It is not across the ringbuffer. The RB can make 
progress on other region appends.
 * The perf gain looks to the result of two phenomenon: 1. parallelism: a 
single thread stamping every edit with a sequence id -- having to cross a 
region-scoped synchronize on each impression -- marching in order over all 
appends looks to be slower than a stamping that is done with some parallelism 
as each handler does its own imprint though there is friction as each handler 
has to contend on the reentrant lock with other handlers that are in the same 
region trying to do the same thing; and 2. no-wait: with the new patch, the 
handler can make progress after calling append where before not until the RB 
consumer on the other side of the RB had let go of the latch.

The RB is good as transmission between N handlers and the single WAL writer. 
The notion that the single consumer manage sequenceid assignment in line w/ the 
appends to WAL, while appealing because of its simplicity, seems to hold up 
throughput because our sequenceid is by region.

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> --------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-16698
>                 URL: https://issues.apache.org/jira/browse/HBASE-16698
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 1.2.3
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0
>         Attachments: HBASE-16698.branch-1.patch, 
> HBASE-16698.branch-1.v2.patch, HBASE-16698.patch, HBASE-16698.v2.patch, 
> hadoop0495.et2.jstack
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.

This message was sent by Atlassian JIRA

Reply via email to