[ https://issues.apache.org/jira/browse/HDFS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011753#comment-13011753 ]
dhruba borthakur commented on HDFS-1790: ---------------------------------------- incVolumeFailure(), isInSafeMode() and isPopulatingReplQueues() should use the FSNamesystem read/write lock. > review use of synchronized methods in FSNamesystem > -------------------------------------------------- > > Key: HDFS-1790 > URL: https://issues.apache.org/jira/browse/HDFS-1790 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0 > Reporter: Matt Foley > Fix For: 0.23.0 > > > While analyzing SafeMode semantics for HDFS-1726, I noticed the following > usage of synchronized methods in FSNamesystem that raised questions: > 1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor > lock), for unclear reasons. Is this a left-over from the conversion to > read/write lock in FSNamesystem? > 2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the > FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call > (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are > synchronized on the SafeModeInfo instance. Is synchronizing on the > FSNamesystem instance necessary? What does it add? If it is necessary, > should it be using the read/write lock? > 3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo > instance: > - isOn > - isPopulatingReplQueues > - leave > - initialize > - canInitializeReplQueues > - canLeave > - setBlock > - incrementSafeBlockCount > - decrementSafeBlockCount > - setManual > but these are not: > - enter > - needEnter > - checkMode > - isManual > - isConsistent > Regarding these: > - isOn() asserts isConsistent(), but is otherwise an atomic read operation. > - isPopulatingReplQueues() is an atomic read. Does it need synchronization? > - leave() is complex, but shouldn't it be synchronized with enter(), which is > also a write operation? Yet enter() is unsynchronized. > - initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), > which can take minutes to run. By synchronizing on the SafeModeInfo > instance, it prevents essentially all of the other safeMode methods from > running for the duration! Is this desirable or needed? > - canInitializeReplQueues() is a read-only operation, although it does > compare two read values. needEnter() compares four read values, and is > unsynchronized. Does either need synchronization? > - canLeave() is a compound operation, it's good that it is synchronized. > - checkMode() is a big compound read/write operation. Doesn't it need > synchronization if the other methods do? > and so on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira