[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Eli Collins (JIRA) Thu, 09 Jun 2011 00:16:51 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eli Collins updated HDFS-988:
-----------------------------

    Attachment: hdfs-988-6.patch

Thanks for the thorough review Todd!

bq. computeDatanodeWork calls  lockManager.computeReplicationWork and 
blockManager.computeInvalidateWork...

Agree. Added a comment in computeDatanodeWork with the rationale.

bq. In heartbeatCheck, I think we can simply put another "if (isInSafeMode()) 
return" in right after it takes the writeLock if it finds a dead node...

Agree, added the additional check, and a comment above the first lock 
aquisition.

bq. isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount()
rather than getReadLockCount()

Fixed.

bq. FSDirectory#bLock says it protects the block map, but it also protects the 
directory, right? we should update the comment and perhaps the name.

Correct. I renamed it dirLock to be consistent with FSNameSystem and to reflect 
that it protects the entire directory, not just the block map. Updated the 
comment too.

bq. various functions don't take the read lock because they call functions in 
FSDirectory that take FSDirectory.bLock. This seems incorrect..

I think the FSN methods do need to get the readlock before calling into FSD 
methods that get the read lock. In the latest patch I added this to the places 
where it was missing (getFileInfo, getContentSummary, getPreferredBlockSize, 
etc), hopefully the new asserts make it clear that we have the FSN lock we 
calling into FSD (where necessary). For HADOOP-7362 we should assert the lock 
hierarchy is obeyed.

bq. handleHeartbeat calls getDatanode() while only holding locks on heartbeats 
and datanodeMap, but registerDatanode mutates datanodeMap without locking 
either.

I modified handleHeartbeat to take the read lock so it's synchronized with 
register datanode. I also added a missing synchronization of datanodeMap to 
wipeDatanode. I think the locking in FSN needs to be revisited (eg the way 
heartbeats and datanodeMap are locked) need to be reconsidered in light of this 
locking. I'll file a jira for that, that's probably out of scope for this jira.

bq. getDataNodeInfo seems like an unused function with no locking - can we 
remove it?

Yes. Done.

bq. several other places access datanodeMap with synchronization on that object 
itself.  unprotectedAddDatanode should assert it holds that monitor lock

Made unprotectedAddDatanode datanodeMap synchronize on datanodeMap, per above I 
think we need to revisit how this datastructure is accessed.

bq. when loading the edit log, why doesn't loadFSEdits take a write lock on the 
namesystem before it starts? then we could add all of the asserts and not worry 
about it.

Done. It now takes the namesystem and dir locks, and I modifed all the 
unprotected methods to assert the lock is held.

bq. it looks like saving the image no longer works, since 
saveFilesUnderConstruction now takes the readLock, but it's being called by a 
different thread than took the write lock in saveNamespace.

Yup, TestEditLogRace quickly found this deadlock. I removed the new locking 
from saveFilesUnderConstruction.

bq. When create() is called with the overwrite flag true, that calls delete() 
which will logSync() while holding the lock. We can hold off on fixing it since 
it's a performance problem, not correctness, and the operation is fairly rare.

Filed HDFS-2052.

bq. getAdditionalBlock doesn't logSync() - I think there's another issue 
pending about that since it will affect HA. Let's address later.

Since getAdditionalBlock doesn't do anything that writes to the log the sync is 
not currently needed. I updated HDFS-978 to indicate it will be needed when 
block allocations are logged.

bq. checkFileProgress doesn't really need the write lock

Yup, modified it to use a read-lock.

bq. seems like saveNamespace could safely just take the read lock to allow 
other readers to keep working

Yup, modified to take the read lock.

Fixed the nits.

The tests should be passing, I'm going to do some testing with jcarder enabled. 
Have also run TestNNThroughputBenchmark, the numbers are noisy, mostly better 
than w/o the patch (not sure I believe that).

> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>
>         Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Reply via email to