[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions

Konstantin Shvachko (JIRA) Mon, 07 Feb 2011 14:02:32 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991646#comment-12991646
 ]


Konstantin Shvachko commented on HDFS-1597:
-------------------------------------------

The patch needs to be updated.
I don't see where {{saveNamespace()}} calls {{logSyncAll()}}. {{logSyncAll()}} 
is called only by {{enterSafeMode()}}.

The main problem seems to be that {{logSync()}} does not hold writer lock. So 
in the race with {{saveNamespace()}} it can kick in at any time. The only way 
to prevent inconsistencies is to make sure all threads waiting to {{logSync()}} 
have everything synced already.
In other words all transactions that started before {{saveNamespace()}} grabbed 
the write lock should complete, and no new transactions should be allowed to 
start while {{saveNamespace()}} is in progress.
So {{saveNamespace()}} must call {{logSyncAll()}} before doing anything with 
the image or edits.

Therefore, moving the assert down is absolutely correct, imo. If a thread sees 
that it's transaction is synced, it should not touch edit streams.

> Batched edit log syncs can reset synctxid throw assertions
> ----------------------------------------------------------
>
>                 Key: HDFS-1597
>                 URL: https://issues.apache.org/jira/browse/HDFS-1597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.22.0
>
>         Attachments: hdfs-1597.txt, illustrate-test-failure.txt
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
>         assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already 
> batched in by another thread.
> This is related to a second bug in which the same case causes synctxid to be 
> reset to 0

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions

Reply via email to