[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Eli Collins (JIRA) Wed, 01 Jun 2011 17:11:36 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eli Collins updated HDFS-988:
-----------------------------

    Attachment: hdfs-988-5.patch

Thanks for taking a look Todd. Updated patch attached.

bq. checks for if (auditLog.isInfoEnabled()) should probably now be 
(auditLog.isInfoEnabled() && isExternalInvocation()) – otherwise we're doing a 
needless directory traversal for fsck

Fixed.

bq. The following methods currently do logSync() while holding the writeLock, 
which is expensive:

Fixed. (Only one needed to conditionally call logSync)

bq. seems strange that some of the xInternal() methods take the write lock 
themselves (eg setReplicationInternal) whereas others assume the caller takes 
the write lock (eg createSymlinkInternal). We should be consistent.

Latest patch makes them more consistent, I also refactored out a couple new 
xInternal methods. In a couple places (eg deleteInternal and getListing) I 
didn't hoist up the locking because it would make the locking too coarse-grain 
(eg would result in syncing the log w/ the lock held).

bq. for those methods that don't explicitly take the write lock, we should 
either add an assert hasWriteLock() or a comment explaining why the lock is not 
necessary (eg internalReleaseLease, reassignLease, 
finalizeINodeFileUnderConstruction)

Done. For FSDirectory I made the unprotectedX methods actually unprotected and 
moved the locking to the caller (except for FSEditLogLoader which calls the 
unprotected methods directly on purpose - I doubt this really saves us that 
much). These methods (per their name) are now intentionally unprotected. 

bq. comment for endCheckpoint says "not started" but should say "not ended".  
same with updatePipeline.

Both fixed.

bq. why doesn't getListing need the read lock?

Because its callees (check*, getListing) take the lock.

bq. I noticed that nextGenerationStamp() doesn't logSync() – that seems 
dangerous, since after a restart we might hand out a duplicate genstamp.

Good catch. I made sure all callers sync the log (this was only missing from 
the updateBlockForPipeline path). nextGenerationStamp is always called with the 
lock held so I asserted that and removed the lock aquisition from this method.

> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>
>         Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988.txt, 
> saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Reply via email to