[
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-988:
-----------------------------
Attachment: 988-fixups.txt
Attaching a few fixups on top of hdfs-988-5.patch, related to the below
comments:
Regarding the question about computeDatanodeWork/heartbeatCheck:
computeDatanodeWork calls blockManager.computeReplicationWork and
blockManager.computeInvalidateWork. In the case of computeReplicationWork, it
might schedule some replications. This seems OK - worst case we get some extra
replicas which will get fixed up later. In the case of computeInvalidateWork,
it calls invalidateWorkForOneNode which takes the write lock and then checks
safe mode before scheduling any deletions.
In heartbeatCheck, I think we can simply put another "if (isInSafeMode())
return" in right after it takes the writeLock if it finds a dead node. That way
if it races, it still doesn't take any actions based on it. Either way, I don't
think this could corrupt anything since it won't write to the edit log.
Some other notes:
- isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount() rather
than getReadLockCount()
- FSDirectory#bLock says it protects the block map, but it also protects the
directory, right? we should update the comment and perhaps the name.
- various functions don't take the read lock because they call functions in
FSDirectory that take FSDirectory.bLock. This seems incorrect, since, for
example, getListing() racing against open() with overwrite=true could return
the directory with the file deleted but the new one not there yet. I guess
what's confusing me is that it's not clear why some functions don't need
readLock when they perform read operations. When is just the FSDirectory lock
sufficient? It looks like a lot of the test failures above are due to this.
- handleHeartbeat calls getDatanode() while only holding locks on heartbeats
and datanodeMap, but registerDatanode mutates datanodeMap without locking
either.
- getDataNodeInfo seems like an unused function with no locking - can we remove
it?
- several other places access datanodeMap with synchronization on that object
itself. unprotectedAddDatanode should assert it holds that monitor lock
- when loading the edit log, why doesn't loadFSEdits take a write lock on the
namesystem before it starts? then we could add all of the asserts and not worry
about it.
- it looks like saving the image no longer works, since
saveFilesUnderConstruction now takes the readLock, but it's being called by a
different thread than took the write lock in saveNamespace. So, it deadlocks.
At first I thought this could be solved by just making saveNamespace take a
read lock instead of write lock, but that actually doesn't work due to fairness
-- what can happen is that saveNamespace takes readLock, then some other thread
comes along and queues up for the write lock. At the point, no further readers
are allowed to take the read lock, because it's a fair lock. So, the
image-writer thread locks up.
Optimizations to address later:
- When create() is called with the overwrite flag true, that calls delete()
which will logSync() while holding the lock. We can hold off on fixing it since
it's a performance problem, not correctness, and the operation is fairly rare.
- getAdditionalBlock doesn't logSync() - I think there's another issue pending
about that since it will affect HA. Let's address later.
- checkFileProgress doesn't really need the write lock
- seems like saveNamespace could safely just take the read lock to allow other
readers to keep working
Nits:
- Typo: "Cnnot concat"
- rollEditLog has comment saying "Checkpoint not created"
- rollFSImage has the same issue, but at least has to do with checkpoints, so
could be correct
> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20-append, 0.21.0, 0.22.0
> Reporter: dhruba borthakur
> Assignee: Eli Collins
> Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch,
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch,
> hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt,
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the
> savenamespace command. This can corrupt the edits log. The problem is that
> when the NN enters safemode, there could still be pending logSycs occuring
> from other threads. Now, the saveNamespace command, when executed, would save
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira