[
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins updated HDFS-988:
-----------------------------
Attachment: hdfs-988-6.patch
Thanks for the thorough review Todd!
bq. computeDatanodeWork calls lockManager.computeReplicationWork and
blockManager.computeInvalidateWork...
Agree. Added a comment in computeDatanodeWork with the rationale.
bq. In heartbeatCheck, I think we can simply put another "if (isInSafeMode())
return" in right after it takes the writeLock if it finds a dead node...
Agree, added the additional check, and a comment above the first lock
aquisition.
bq. isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount()
rather than getReadLockCount()
Fixed.
bq. FSDirectory#bLock says it protects the block map, but it also protects the
directory, right? we should update the comment and perhaps the name.
Correct. I renamed it dirLock to be consistent with FSNameSystem and to reflect
that it protects the entire directory, not just the block map. Updated the
comment too.
bq. various functions don't take the read lock because they call functions in
FSDirectory that take FSDirectory.bLock. This seems incorrect..
I think the FSN methods do need to get the readlock before calling into FSD
methods that get the read lock. In the latest patch I added this to the places
where it was missing (getFileInfo, getContentSummary, getPreferredBlockSize,
etc), hopefully the new asserts make it clear that we have the FSN lock we
calling into FSD (where necessary). For HADOOP-7362 we should assert the lock
hierarchy is obeyed.
bq. handleHeartbeat calls getDatanode() while only holding locks on heartbeats
and datanodeMap, but registerDatanode mutates datanodeMap without locking
either.
I modified handleHeartbeat to take the read lock so it's synchronized with
register datanode. I also added a missing synchronization of datanodeMap to
wipeDatanode. I think the locking in FSN needs to be revisited (eg the way
heartbeats and datanodeMap are locked) need to be reconsidered in light of this
locking. I'll file a jira for that, that's probably out of scope for this jira.
bq. getDataNodeInfo seems like an unused function with no locking - can we
remove it?
Yes. Done.
bq. several other places access datanodeMap with synchronization on that object
itself. unprotectedAddDatanode should assert it holds that monitor lock
Made unprotectedAddDatanode datanodeMap synchronize on datanodeMap, per above I
think we need to revisit how this datastructure is accessed.
bq. when loading the edit log, why doesn't loadFSEdits take a write lock on the
namesystem before it starts? then we could add all of the asserts and not worry
about it.
Done. It now takes the namesystem and dir locks, and I modifed all the
unprotected methods to assert the lock is held.
bq. it looks like saving the image no longer works, since
saveFilesUnderConstruction now takes the readLock, but it's being called by a
different thread than took the write lock in saveNamespace.
Yup, TestEditLogRace quickly found this deadlock. I removed the new locking
from saveFilesUnderConstruction.
bq. When create() is called with the overwrite flag true, that calls delete()
which will logSync() while holding the lock. We can hold off on fixing it since
it's a performance problem, not correctness, and the operation is fairly rare.
Filed HDFS-2052.
bq. getAdditionalBlock doesn't logSync() - I think there's another issue
pending about that since it will affect HA. Let's address later.
Since getAdditionalBlock doesn't do anything that writes to the log the sync is
not currently needed. I updated HDFS-978 to indicate it will be needed when
block allocations are logged.
bq. checkFileProgress doesn't really need the write lock
Yup, modified it to use a read-lock.
bq. seems like saveNamespace could safely just take the read lock to allow
other readers to keep working
Yup, modified to take the read lock.
Fixed the nits.
The tests should be passing, I'm going to do some testing with jcarder enabled.
Have also run TestNNThroughputBenchmark, the numbers are noisy, mostly better
than w/o the patch (not sure I believe that).
> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20-append, 0.21.0, 0.22.0
> Reporter: dhruba borthakur
> Assignee: Eli Collins
> Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch,
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch,
> hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt,
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the
> savenamespace command. This can corrupt the edits log. The problem is that
> when the NN enters safemode, there could still be pending logSycs occuring
> from other threads. Now, the saveNamespace command, when executed, would save
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira