[
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991659#comment-12991659
]
Todd Lipcon commented on HDFS-1597:
-----------------------------------
Hi Konst. You're right, {{logSyncAll}} is called by {{enterSafeMode}} not
{{saveNamespace}}, my bad.
But I think since we have the policy that {{saveNamespace}} always runs while
in safe mode, and we never will write edits when in safe mode, we should be
safe. Another way of saying this is that between {{enterSafeMode}} and
{{leaveSafeMode}} we won't have any pending edits. This might depend on
completing HDFS-955, but it's a separate issue from this JIRA.
As for the performance issue, it's because of the other bug mentioned in this
JIRA - if a thread's transaction is batched, it will reset {{synctxid}} to 0 on
its way out of {{logSync}}. This can cause other thread's transactions to *not*
get batched where they normally would have. In a test run of
NNThroughputBenchmark this caused a 3x degredation because so many
sync-batching opportunities were lost.
> Batched edit log syncs can reset synctxid throw assertions
> ----------------------------------------------------------
>
> Key: HDFS-1597
> URL: https://issues.apache.org/jira/browse/HDFS-1597
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1597.txt, hdfs-1597.txt, illustrate-test-failure.txt
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
> assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already
> batched in by another thread.
> This is related to a second bug in which the same case causes synctxid to be
> reset to 0
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira