[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

stack (JIRA) Thu, 05 Dec 2013 05:36:41 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840074#comment-13840074
 ]


stack commented on HBASE-8755:
------------------------------

Nice review [~himan...@cloudera.com]

On 3., above, yes, that is true.  HLogPE does not seem to be representative as 
you suggest.  But does your  change below....

{code}
-            hlog.append(hri, hri.getTable(), walEdit, now, htd, 
region.getSequenceId());
+            // this is how almost all users of HLog use it (all but compaction 
calls).
+            long txid = hlog.appendNoSync(hri, hri.getTable(), walEdit, 
clusters, now, htd,
+              region.getSequenceId(), true, nonce, nonce);
+            hlog.sync(txid);
+
{code}

... bring it closer to a 'real' use case?  I see over in HRegion that we do a 
bunch of appendNoSync in minibatch or even in put before we call sync.   Should 
we append more than just one set of edits before we call the sync?

I suppose on a regionserver with a load of regions loaded up on it, all these 
syncs can come crashing in on top of each other on to the underlying WAL in an 
arbitary manner -- something Feng Honghua's patch mitigates some by making it 
so syncs are done when FSHLog thinks it appropriate rather than when some 
arbitrary HRegion call thinks it right ... and this is probably part of the 
reason for the perf improvement.

Could we better regulate the sync calls so they are even less arbitrary?  Even 
them out?  It could make for better performance if there was a mechanism 
against syncs clumping together.

Looking at your patch, the syncer is very much like Feng Honghua's -- it is 
interesting that you two independently came up w/ similar multithreaded syncing 
mechanism.  That would seem to 'prove' this is a good approach.  Feng's patch 
is much further along with a bunch of cleanup of FSHLog.  Will wait on his 
comments on what he thinks of doing without AsyncWriter and AsyncNotifier.

Looks like your patch is far enough long for us to do tests comparing the 
approaches? 



> A new write thread model for HLog to improve the overall HBase write 
> throughput
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, wal
>            Reporter: Feng Honghua
>            Assignee: stack
>            Priority: Critical
>         Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
> HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
> HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
> HBASE-8755-trunk-v4.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 70000+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 70000 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Reply via email to