[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Feng Honghua (JIRA) Tue, 18 Jun 2013 06:32:13 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686700#comment-13686700
 ]


Feng Honghua commented on HBASE-8755:
-------------------------------------

Thanks [[email protected]] and [~stack] for the detailed review. I make and 
attach a update patch based on trunk according to your reviews.

Below are answers to some important questions Ted/stack raised in the reviews 
(I have already answered some from Ted in above comment):

[Ted] AsyncNotifier does notification by calling syncedTillHere.notifyAll(). 
Can this part be folded into AsyncFlusher ?

 ===> AsyncNotifier will compete syncedTillHere with all the write handler 
threads(which may finish the appendNoSync but not pend on syncer()). The 
performance is better by separating AsyncSyncer(which just get notified, do 
'sync' and then notify AsyncNotifier) and AsyncNotifier(get notified by 
AsyncSyncer and wake-up all pending write handler threads)

[stack] Any idea of its effect on general latencies? Does 
HLogPerformanceEvaluation help evaluating this approach? Did you deploy this 
code to production?

  ===> I don't run HLogPerformanceEvaluation for performance comparison. 
instead I used 5 YCSB clients to concurrently press on a single RS with a 5 
data-node underlying HDFS. Everything are the same for test with Old/New write 
thread models except the RS bits are different. We are testing it in the test 
cluster for a month, but not deployed to production yet. Below is the detailed 
performance comparison for your reference.

  a> 5 YCSB clients, each with 80 concurrent write theads (auto-flush = true)
  b> each YCSB writes 5000,000 rows
  c> all 20 regions of the target table are moved to a single RS

Old write thread model:

row size(bytes) latency(ms)     QPS
------------------------------------------
2000            37.3            10715
1000            32.8            12149
500             30.9            12891
200             26.9            14803
10              24.5            16288

New write thread model:

row size(bytes) latency(ms)     QPS
-------------------------------------------
2000            17.3            23024
1000            12.6            31523
500             11.7            33893
200             11.4            34876
10              11.1            35804


[stack] Can I still (if only optionally) sync every write as it comes in? (For 
the paranoid).

  ===> can't for now, I'll consider how to make it configurable later on.

[stack] Regards the above, the test is no longer valid given the indirection 
around sync/flush?

  ===> Yes, that test is not valid by new write thread modeldeferred log flush

[stack] To be clear, when we call doWrite, we just append the log edit to a 
linked list? (We call it a bufferLock but we just doing append to the linked 
list?)

  ===> Yes, in both old and new write thread models what doWrite does is just 
appending log edit to a linked list which plays a role as a 'local' buffer for 
log edits what don't hit hdfs deferred log flushyet.

[stack] How does deferred log flush still work when you remove stuff like 
optionalFlushInterval? You say '...don't pend on HLog.syncer() waiting for its 
txid to be sync-ed' but that is another behavior than what we had here 
previously.

  ===> When say 'still support deferred log flush' I mean for 'deferred log 
flush' it can still response write success to client without wait/pend on 
syncer(txid), in this sense, the AsyncWriter/AsyncSyncer do what the previous 
LogSyncer does from the point view of the write handler threads: clients don't 
wait for the write persist before get reponse success.
                
> A new write thread model for HLog to improve the overall HBase write 
> throughput
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Feng Honghua
>         Attachments: HBASE-8755-0.94-V0.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 70000+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 70000 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Reply via email to