[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions

Todd Lipcon (JIRA) Mon, 07 Feb 2011 14:23:20 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991659#comment-12991659
 ]


Todd Lipcon commented on HDFS-1597:
-----------------------------------

Hi Konst. You're right, {{logSyncAll}} is called by {{enterSafeMode}} not 
{{saveNamespace}}, my bad.

But I think since we have the policy that {{saveNamespace}} always runs while 
in safe mode, and we never will write edits when in safe mode, we should be 
safe. Another way of saying this is that between {{enterSafeMode}} and 
{{leaveSafeMode}} we won't have any pending edits. This might depend on 
completing HDFS-955, but it's a separate issue from this JIRA.

As for the performance issue, it's because of the other bug mentioned in this 
JIRA - if a thread's transaction is batched, it will reset {{synctxid}} to 0 on 
its way out of {{logSync}}. This can cause other thread's transactions to *not* 
get batched where they normally would have. In a test run of 
NNThroughputBenchmark this caused a 3x degredation because so many 
sync-batching opportunities were lost.

> Batched edit log syncs can reset synctxid throw assertions
> ----------------------------------------------------------
>
>                 Key: HDFS-1597
>                 URL: https://issues.apache.org/jira/browse/HDFS-1597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.22.0
>
>         Attachments: hdfs-1597.txt, hdfs-1597.txt, illustrate-test-failure.txt
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
>         assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already 
> batched in by another thread.
> This is related to a second bug in which the same case causes synctxid to be 
> reset to 0

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions

Reply via email to