[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

Todd Lipcon (JIRA) Mon, 15 Feb 2010 22:20:54 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834096#action_12834096
 ]


Todd Lipcon commented on HDFS-909:
----------------------------------

I thought about this a bit more deeply over dinner :) I think the potential 
issue is not in fact a deadlock,
since waitForAllOngoingEdits is a bit misnamed. In fact, it just calls through 
to logSync, so in the
"bad code" example above, what will happen is that we sync the first 
synchronized block's edit for it,
and then enter safe mode before entering the second synchronized block.

So, to be correct, code just needs to make sure that it always checks 
isInSafeMode() each time it enters
synchronized (this) {...}, before making any edit. Dhruba, do you concur?

Switching to the ReadWriteLock would be nice since we could enforce the 
safemode check as part of
the lock acquisition. But that's definitely out of scope.

So, I think this is a certain improvement with no deadlock potential. The error 
case described above is
no worse than what currently happens. So, I think the patch is good, and we'll 
use HDFS-956 to fix the
"always check safe mode after entering synchronized blocks" problems.

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
> operations  corrupts edits log
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-909
>                 URL: https://issues.apache.org/jira/browse/HDFS-909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>         Environment: CentOS
>            Reporter: Cosmin Lehene
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
> hdfs-909.txt, hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation 
> resulting in OP_INVALID end-of-file marker being initially overwritten by the 
> concurrent (in setReadyToFlush) threads and then removed twice from the 
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the 
> FSEdits.logSync() method level. However at a lower level access to 
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
> synchronized. These can be called from concurrent threads like in the example 
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write 
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
> "end-of-file marker") and then remove it twice (from each thread) in 
> flushAndSync()! Hence there will be a valid byte missing from the edits log 
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
> cluster restart. 
> We got to this point after investigating a corrupted edits file that made 
> HDFS unable to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but 
> writables.length is 768. 
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

Reply via email to