[
https://issues.apache.org/jira/browse/HDFS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011753#comment-13011753
]
dhruba borthakur commented on HDFS-1790:
----------------------------------------
incVolumeFailure(), isInSafeMode() and isPopulatingReplQueues() should use the
FSNamesystem read/write lock.
> review use of synchronized methods in FSNamesystem
> --------------------------------------------------
>
> Key: HDFS-1790
> URL: https://issues.apache.org/jira/browse/HDFS-1790
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Matt Foley
> Fix For: 0.23.0
>
>
> While analyzing SafeMode semantics for HDFS-1726, I noticed the following
> usage of synchronized methods in FSNamesystem that raised questions:
> 1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor
> lock), for unclear reasons. Is this a left-over from the conversion to
> read/write lock in FSNamesystem?
> 2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the
> FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call
> (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are
> synchronized on the SafeModeInfo instance. Is synchronizing on the
> FSNamesystem instance necessary? What does it add? If it is necessary,
> should it be using the read/write lock?
> 3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo
> instance:
> - isOn
> - isPopulatingReplQueues
> - leave
> - initialize
> - canInitializeReplQueues
> - canLeave
> - setBlock
> - incrementSafeBlockCount
> - decrementSafeBlockCount
> - setManual
> but these are not:
> - enter
> - needEnter
> - checkMode
> - isManual
> - isConsistent
> Regarding these:
> - isOn() asserts isConsistent(), but is otherwise an atomic read operation.
> - isPopulatingReplQueues() is an atomic read. Does it need synchronization?
> - leave() is complex, but shouldn't it be synchronized with enter(), which is
> also a write operation? Yet enter() is unsynchronized.
> - initializeReplQueues() calls blockManager.processMisReplicatedBlocks(),
> which can take minutes to run. By synchronizing on the SafeModeInfo
> instance, it prevents essentially all of the other safeMode methods from
> running for the duration! Is this desirable or needed?
> - canInitializeReplQueues() is a read-only operation, although it does
> compare two read values. needEnter() compares four read values, and is
> unsynchronized. Does either need synchronization?
> - canLeave() is a compound operation, it's good that it is synchronized.
> - checkMode() is a big compound read/write operation. Doesn't it need
> synchronization if the other methods do?
> and so on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira