[
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919033#comment-13919033
]
Liyin Tang commented on HBASE-10659:
------------------------------------
1) Since updating memstore is much faster than HLog syncing, one
memstore-update-thread seems to be sufficient. Or we can make it configurable
as each HLogSyncer thread will have a corresponding memstore-update-thread.
2) The HLogSyncer thread will batch multiple transactions, as a group commit,
from different IPC writer threads, and then sync this group commit into HLog
stream. And then, the memstore-update-thread will take this group commit and
update the corresponding memstore in (sequence id) order.
> [89-fb] Optimize the threading model in HBase write path
> --------------------------------------------------------
>
> Key: HBASE-10659
> URL: https://issues.apache.org/jira/browse/HBASE-10659
> Project: HBase
> Issue Type: New Feature
> Reporter: Liyin Tang
>
> Recently, we have done multiple prototypes to optimize the HBase (0.89)write
> path. And based on the simulator results, the following model is able to
> achieve much higher overall throughput with less threads.
> IPC Writer Threads Pool:
> IPC handler threads will prepare all Put requests, and append the WALEdit, as
> one transaction, into a concurrent collection with a read lock. And then just
> return;
> HLogSyncer Thread:
> Each HLogSyncer thread is corresponding to one HLog stream. It swaps the
> concurrent collection with a write lock, and then iterate over all the
> elements in the previous concurrent collection, generate the sequence id for
> each transaction, and write to HLog. After the HLog sync is done, append
> these transactions as a batch into a blocking queue.
> Memstore Update Thread:
> The memstore update thread will poll the blocking queue and update the
> memstore for each transaction by using the sequence id as MVCC. Once the
> memstore update is done, dispatch to the responder thread pool to return to
> the client.
> Responder Thread Pool:
> Responder thread pool will return the RPC call in parallel.
> We are still evaluating this model and will share more results/numbers once
> it is ready. But really appreciate any comments in advance !
--
This message was sent by Atlassian JIRA
(v6.2#6252)