[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Feng Honghua (JIRA) Mon, 09 Dec 2013 07:08:02 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843224#comment-13843224
 ]


Feng Honghua commented on HBASE-8755:
-------------------------------------

[~stack] : thanks for the comment, below are the comments after corresponding 
change. a new patch based on [~v.himanshu]'s latest v5 patch is attached(thanks 
[~v.himanshu])
bq.Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.
==> done
bq.using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight
==> done
bq.Please remove all mentions of AsyncFlush since it no longer exists
==> done
bq.Is this comment right? // txid <= failedTxid will fail by throwing asyncIOE; 
 Should it be >= failedTxid?
==> this comment is right: txid larger than failedTxid isn't sync-ed by the one 
that notifies failedTxid. but txid smaller than or equal to failedTxid is (not 
must be, but since we don't maintain a txid range to syncer mapping, so we fail 
all txid smaller than or equal to failedTxid, this aligns with HBase's write 
semantic of 'failed write may succeed in fact'. this is a point we can refine 
later on by adding txid range to sync operation mapping to precisely indicate 
failure)
bq.This should be volatile since it is set by AsyncSync and then used by the 
main FSHLog thread (you have an assert to check it not null – maybe you ran 
into an issue here already?):   + private IOException asyncIOE = null;
==> done
bq.'bufferLock' if a very generic name. Could it be more descriptive? It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL. 
You add a method named getPendingWrites that requires this lock be held. Could 
we tie the method and the lock together better? Name it pendingWritesLock? (The 
name of the list to hold the pending writes is pendingWrites).
==> done
bq. (because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.) I think the heavyweight referred to in the above is hbase 
locking...please adjust the comment
==> done
bq.Comments on what these threads do will help the next code reader
==> done
bq.Your patch should remove the optional flush config from hbase-default.xml 
too since it no longer is relevant
==> done
bq.A small nit is you might look at other threads in hbase and see how they are 
named...It might be good if these better align
==> done
bq.Probably make the number of asyncsyncers a configuration
==> done
bq.but we do not seem to be doing it on the other call to doWrite at around 
line #969 inside in append
==> doWrite is called inside append, and the bufferLock(now renamed to 
pendingWritesLock) is held there
bq.This method(setPendingTxid) is only called at close time if I read the patch 
right
==> it's called inside append() once doWrite() is done to notify AsyncWriter 
there are new pendingWrites to write to HDFS, it's not called at close time. 
you can double check it:-)
bq.Is this 'fatal'? Or is it an 'error' 
==> done
bq.and request a log roll yet we carry on to try and sync, an op that will 
likely fail? We are ok here? We updated the write txid but not the sync txid so 
that should be fine.
==> we can't retry to sync after a log roll since we can't sync to a new hlog 
while the writes were written to the old hlog. we failed all the transactions 
with txid <= write txid, it's ok here.
bq.Do we need this: if (!asyncSyncers[ i ].isSyncing()) DFSClient will allow us 
call sync concurrently. I think DFSClient allow us call sync 
concurrently...HDFS will handle(synchronize) concurrent sync?
bq.Can these be static classes or do they need context form the hosting FSHLog?
==> they all need context from hosting FSHLog (such as 
writer/asyncWriter/asyncSyncer/asyncNotifier)
bq.These method names should not talk about 'flush'. They should be named 
'sync' instead. Same for the flushlock.
==> done
bq.Why atomic boolean and not just a volatile here? private AtomicBoolean 
isSyncing = new AtomicBoolean(false);
==> done
bq.The above is very important. All your threads do this?
==> yes
bq.It talks about writeChunk being expensive but we are not doing anything to 
ameliorate dfsclient writes if I dig down into our log writer
==> done (remove it)

> A new write thread model for HLog to improve the overall HBase write 
> throughput
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, wal
>            Reporter: Feng Honghua
>            Assignee: stack
>            Priority: Critical
>         Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
> HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
> HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
> HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 70000+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 70000 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Reply via email to