[
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041934#comment-13041934
]
Todd Lipcon commented on HDFS-988:
----------------------------------
Didn't go through the new tests yet, but here are some comments from a first
pass through FSN:
- checks for if (auditLog.isInfoEnabled()) should probably now be
(auditLog.isInfoEnabled() && isExternalInvocation()) -- otherwise we're doing a
needless directory traversal for fsck
- The following methods currently do logSync() while holding the writeLock,
which is expensive:
-- setPermission
-- setOwner
-- commitBlockSynchronization (in some exit paths)
-- updatePipeline
- These methods should probably just set a local boolean within the
synchronized section, then logSync() in the finally clause if it's flagged
- seems strange that some of the xInternal() methods take the write lock
themselves (eg setReplicationInternal) whereas others assume the caller takes
the write lock (eg createSymlinkInternal). We should be consistent
- for those methods that don't explicitly take the write lock, we should either
add an {{assert hasWriteLock()}} or a comment explaining why the lock is not
necessary (eg internalReleaseLease, reassignLease,
finalizeINodeFileUnderConstruction)
- why doesn't getListing need the read lock?
- comment for endCheckpoint says "not started" but should say "not ended"
- same with updatePipeline
- I noticed that nextGenerationStamp() doesn't logSync() -- that seems
dangerous, since after a restart we might hand out a duplicate genstamp.
> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20-append, 0.21.0, 0.22.0
> Reporter: dhruba borthakur
> Assignee: Eli Collins
> Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch,
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt,
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the
> savenamespace command. This can corrupt the edits log. The problem is that
> when the NN enters safemode, there could still be pending logSycs occuring
> from other threads. Now, the saveNamespace command, when executed, would save
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira