[
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446918#comment-13446918
]
Chao Shi commented on HDFS-3885:
--------------------------------
A similar one to save network latency: sync logs to lagging node in a larger
batch. I guess a batch of 512K or 1MB should be much efficient.
Note that this can also work for uncommitted transactions. Imagine this with 3
JNs:
Tx1 is committed by JN1 and JN2. QJM is writing Tx2. JN3 is lagging. So we have
tx1 and tx2 in its queue. We can send them to JN3 in a batch.
To implement the above idea, it needs more changes to current code structure,
which simply uses a single threaded executor as the queue.
> QJM: optimize log sync when JN is lagging behind
> ------------------------------------------------
>
> Key: HDFS-3885
> URL: https://issues.apache.org/jira/browse/HDFS-3885
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
>
> This is a potential optimization that we can add to the JournalNode: when one
> of the nodes is lagging behind the others (eg because its local disk is
> slower or there was a network blip), it receives edits after they've been
> committed to a majority. It can tell this because the committed txid included
> in the request info is higher than the highest txid in the actual batch to be
> written. In this case, we know that this batch has already been fsynced to a
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to
> catch back up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira