[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-988:
-----------------------------

    Attachment: hdfs-988.txt

Attaching an updated patch for trunk. Additions:
- commitBlockSynchronization no longer allowed in safemode. Konstantin, do you 
still prefer that we open a new issue for this? Dhruba seems to agree that it 
should be disallowed.
- startCheckpoint, endCheckpoint, and updatePipeline also check safemode now
- the new delegation token logging methods check safemode as well.

Some questions for review:
- Will logUpdateMasterKey be OK with the SafeModeException?
- Are there some asserts we could add to make it easier to catch these bugs in 
the future? For example, we could assert !namesystem.isInSafeMode() in 
FSEditLog.logSync(). Then if we ran assertions on unit tests we'd probably 
notice if we were accidentally making edits while in safe mode.

Some responses to review above:

bq. FSNamesystem.getAdditionalBlock() checking isInSafeMode() should be before 
calling chooseTargets(). I would not change getAdditionalBlock() at all.

Right now, getAdditionalBlock is split into two synchronized blocks. The safe 
mode status could switch between the two. Are you suggesting that we check 
safemode in both, or we combine the blocks into one? I assumed the intent was 
to avoid doing the potentially CPU-heavy chooseTarget work while synchronized.

bq. renewLease() shouldn't be under FSNamesystem lock? leaseManeger has its own 
lock

This is to prevent safemode from switching while calling renewLease. If we 
decided that renewing a lease under safemode is not allowed, then we need to 
synchronize here. Otherwise the check is prone to races.

Regarding deadlock potential, I think we're safe since LeaseManager.Monitor 
synchronizes on FSNamesystem before synchronizing on the lease manager.

bq. Your changes to permission methods incorporate HDFS-133.

Resolved that one as dup, thanks.

> saveNamespace can corrupt edits log
> -----------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>         Attachments: hdfs-988.txt, saveNamespace.txt
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to