[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Todd Lipcon (JIRA) Wed, 01 Jun 2011 22:26:39 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HDFS-988:
-----------------------------

    Attachment: 988-fixups.txt

Attaching a few fixups on top of hdfs-988-5.patch, related to the below 
comments:


Regarding the question about computeDatanodeWork/heartbeatCheck:

computeDatanodeWork calls blockManager.computeReplicationWork and 
blockManager.computeInvalidateWork. In the case of computeReplicationWork, it 
might schedule some replications. This seems OK - worst case we get some extra 
replicas which will get fixed up later. In the case of computeInvalidateWork, 
it calls invalidateWorkForOneNode which takes the write lock and then checks 
safe mode before scheduling any deletions.

In heartbeatCheck, I think we can simply put another "if (isInSafeMode()) 
return" in right after it takes the writeLock if it finds a dead node. That way 
if it races, it still doesn't take any actions based on it. Either way, I don't 
think this could corrupt anything since it won't write to the edit log.


Some other notes:
- isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount() rather 
than getReadLockCount()
- FSDirectory#bLock says it protects the block map, but it also protects the 
directory, right? we should update the comment and perhaps the name.
- various functions don't take the read lock because they call functions in 
FSDirectory that take FSDirectory.bLock. This seems incorrect, since, for 
example, getListing() racing against open() with overwrite=true could return 
the directory with the file deleted but the new one not there yet. I guess 
what's confusing me is that it's not clear why some functions don't need 
readLock when they perform read operations. When is just the FSDirectory lock 
sufficient? It looks like a lot of the test failures above are due to this.
- handleHeartbeat calls getDatanode() while only holding locks on heartbeats 
and datanodeMap, but registerDatanode mutates datanodeMap without locking 
either.
- getDataNodeInfo seems like an unused function with no locking - can we remove 
it?
- several other places access datanodeMap with synchronization on that object 
itself. unprotectedAddDatanode should assert it holds that monitor lock
- when loading the edit log, why doesn't loadFSEdits take a write lock on the 
namesystem before it starts? then we could add all of the asserts and not worry 
about it.


- it looks like saving the image no longer works, since 
saveFilesUnderConstruction now takes the readLock, but it's being called by a 
different thread than took the write lock in saveNamespace. So, it deadlocks. 
At first I thought this could be solved by just making saveNamespace take a 
read lock instead of write lock, but that actually doesn't work due to fairness 
-- what can happen is that saveNamespace takes readLock, then some other thread 
comes along and queues up for the write lock. At the point, no further readers 
are allowed to take the read lock, because it's a fair lock. So, the 
image-writer thread locks up.


Optimizations to address later:
- When create() is called with the overwrite flag true, that calls delete() 
which will logSync() while holding the lock. We can hold off on fixing it since 
it's a performance problem, not correctness, and the operation is fairly rare.
- getAdditionalBlock doesn't logSync() - I think there's another issue pending 
about that since it will affect HA. Let's address later.
- checkFileProgress doesn't really need the write lock
- seems like saveNamespace could safely just take the read lock to allow other 
readers to keep working


Nits:
- Typo: "Cnnot concat"
- rollEditLog has comment saying "Checkpoint not created"
- rollFSImage has the same issue, but at least has to do with checkpoints, so 
could be correct


> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>
>         Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

Reply via email to