[
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-3885:
------------------------------
Attachment: hdfs-3885.txt
It wasn't easy to figure out how to write a unit test for this change, but I
verified as follows:
- Started a 3-node QJM cluster
- strace -efdatasync,write -f <pid of one JN>
- write lots of txns to the NN. This shows a lot of fdatasync and write calls,
mostly alternating (write a chunk, fsync, write a chunk, fsync, etc)
- kill -STOPped that JN for 10-15 seconds
- kill -CONT that JN
- saw a bunch of write() with no fdatasync calls while it was still catching
up. After it caught up, it started syncing again.
I also verified that it caught up much faster with this change in place.
> QJM: optimize log sync when JN is lagging behind
> ------------------------------------------------
>
> Key: HDFS-3885
> URL: https://issues.apache.org/jira/browse/HDFS-3885
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-3885.txt
>
>
> This is a potential optimization that we can add to the JournalNode: when one
> of the nodes is lagging behind the others (eg because its local disk is
> slower or there was a network blip), it receives edits after they've been
> committed to a majority. It can tell this because the committed txid included
> in the request info is higher than the highest txid in the actual batch to be
> written. In this case, we know that this batch has already been fsynced to a
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to
> catch back up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira