review use of synchronized methods in FSNamesystem
--------------------------------------------------
Key: HDFS-1790
URL: https://issues.apache.org/jira/browse/HDFS-1790
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Matt Foley
Fix For: 0.23.0
While analyzing SafeMode semantics for HDFS-1726, I noticed the following usage
of synchronized methods in FSNamesystem that raised questions:
1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor
lock), for unclear reasons. Is this a left-over from the conversion to
read/write lock in FSNamesystem?
2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the
FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call
(safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are synchronized
on the SafeModeInfo instance. Is synchronizing on the FSNamesystem instance
necessary? What does it add? If it is necessary, should it be using the
read/write lock?
3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo instance:
isOn
isPopulatingReplQueues
leave
initialize
canInitializeReplQueues
canLeave
setBlock
incrementSafeBlockCount
decrementSafeBlockCount
setManual
but these are not:
enter
needEnter
checkMode
isManual
isConsistent
- isOn() asserts isConsistent(), but is otherwise an atomic read operation.
- isPopulatingReplQueues() is an atomic read. Does it need synchronization?
- leave() is complex, but shouldn't it be synchronized with enter(), which is
also a write operation? Yet enter() is unsynchronized.
- initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), which
can take minutes to run. By synchronizing on the SafeModeInfo instance, it
prevents essentially all of the other safeMode methods from running for the
duration! Is this desirable or needed?
- canInitializeReplQueues() is a read-only operation, although it does compare
two read values. needEnter() compares four read values, and is unsynchronized.
Does either need synchronization?
- canLeave() is a compound operation, it's good that it is synchronized.
- checkMode() is a big compound read/write operation. Doesn't it need
synchronization if the other methods do?
and so on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira