[
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779581#comment-13779581
]
Liu Shaohui commented on HBASE-8755:
------------------------------------
[~stack] [~fenghh]
We redo the comparision test using HogPE. Here are the results:
Test env:
hdfs: cdh 4.1.0, five datanode, each node has 12 sata disks.
hbase: 0.94.3
HLogPE is run on one of these datanodes, so one replica of hlog's block will be
at local datanode.
The params of HLogPE are: -iterations 1000000 -keySize 50 -valueSize 100, which
are same as stack's tests.
{code}
for i in 1 5 50 75 100; do
for j in 1 2 3; do
./bin/hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -verify
-threads "${i}" -iterations 1000000 -keySize 50 -valueSize 100 &>
log-patch"${i}"."${j}".txt;
grep "Summary: " log-patch"${i}"."${j}".txt
done;
done
{code}
||Thread||Count||WithoutPatch||WithPatch||
|1|579.380|625.937|-8.03|
|1|580.307|630.346|-8.62|
|1|577.853|654.20|-13.21|
|5|799.579|785.696|1.73|
|5|795.013|780.642|1.80|
|5|826.270|781.909|5.36|
|50|3290.482|1165.773|64.57|
|50|3298.387|1167.992|64.58|
|50|3224.495|1154.921|64.18|
|75|4450.760|1253.448|71.83|
|75|4506.143|1269.806|71.82|
|75|4516.453|1245.954|72.41|
|100|5561.074|1493.102|73.15|
|100|5616.810|1496.263|73.36|
|100|5612.268|1468.500|73.83|
a, When thread number is 1, we see that the performance of our test is about
40% better than that of stack's test, both in old thread mode and new thread
mode. [~stack] what's the hdfs version in your test or are there special
configs?
b, When thread number is 5, we do not see -50% downgrade.
> A new write thread model for HLog to improve the overall HBase write
> throughput
> -------------------------------------------------------------------------------
>
> Key: HBASE-8755
> URL: https://issues.apache.org/jira/browse/HBASE-8755
> Project: HBase
> Issue Type: Improvement
> Components: Performance, wal
> Reporter: Feng Honghua
> Assignee: stack
> Priority: Critical
> Fix For: 0.96.1
>
> Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch,
> HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch
>
>
> In current write model, each write handler thread (executing put()) will
> individually go through a full 'append (hlog local buffer) => HLog writer
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write,
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in
> expectation for other thread help write/sync its own txid to hdfs and
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi
> proposed a new write thread model for writing hdfs sequence file and the
> prototype implementation shows a 4X improvement for throughput (from 17000 to
> 70000+).
> I apply this new write thread model in HLog and the performance test in our
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1
> RS, from 22000 to 70000 for 5 RS), the 1 RS write throughput (1K row-size)
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's
> write throughput then is 31002). I can provide the detailed performance test
> results if anyone is interested.
> The change for new write thread model is as below:
> 1> All put handler threads append the edits to HLog's local pending buffer;
> (it notifies AsyncWriter thread that there is new edits in local buffer)
> 2> All put handler threads wait in HLog.syncer() function for underlying
> threads to finish the sync that contains its txid;
> 3> An single AsyncWriter thread is responsible for retrieve all the buffered
> edits in HLog's local pending buffer and write to the hdfs
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new
> writes to hdfs that needs a sync)
> 4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread
> that sync watermark increases)
> 5> An single AsyncNotifier thread is responsible for notifying all pending
> put handler threads which are waiting in the HLog.syncer() function
> 6> No LogSyncer thread any more (since there is always
> AsyncWriter/AsyncFlusher threads do the same job it does)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira