[
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991646#comment-12991646
]
Konstantin Shvachko commented on HDFS-1597:
-------------------------------------------
The patch needs to be updated.
I don't see where {{saveNamespace()}} calls {{logSyncAll()}}. {{logSyncAll()}}
is called only by {{enterSafeMode()}}.
The main problem seems to be that {{logSync()}} does not hold writer lock. So
in the race with {{saveNamespace()}} it can kick in at any time. The only way
to prevent inconsistencies is to make sure all threads waiting to {{logSync()}}
have everything synced already.
In other words all transactions that started before {{saveNamespace()}} grabbed
the write lock should complete, and no new transactions should be allowed to
start while {{saveNamespace()}} is in progress.
So {{saveNamespace()}} must call {{logSyncAll()}} before doing anything with
the image or edits.
Therefore, moving the assert down is absolutely correct, imo. If a thread sees
that it's transaction is synced, it should not touch edit streams.
> Batched edit log syncs can reset synctxid throw assertions
> ----------------------------------------------------------
>
> Key: HDFS-1597
> URL: https://issues.apache.org/jira/browse/HDFS-1597
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1597.txt, illustrate-test-failure.txt
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
> assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already
> batched in by another thread.
> This is related to a second bug in which the same case causes synctxid to be
> reset to 0
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira