[ 
https://issues.apache.org/jira/browse/HDFS-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6000:
----------------------------

    Attachment: HDFS-6000.003.patch

Thanks for the review, Nicholas! Update the patch to address your comments.

bq. I think it is good to keep the txid parameter in 
FSImage.hasRollbackFSImage().
Yes, I also think adding a txid for hasRollbackFSImage to indicate the range of 
the rollback image will be better. But since we currently already make sure we 
purge/rename the rollback image after rolling upgrade (including rollback and 
downgrade), this is not necessary right now. Maybe we can do it in a separate 
jira?

bq. Let's just checkNameNodeSafeMode for both HA and non-HA cases.
After an offline discussion with Nicholas, we think maybe we can require the NN 
in safemode when doing the checkpoint for rollback. Then NN will come out of 
the safemode automatically and create the rollingupgradeinfo object. This can 
avoid clients to hang during saving namespace (clients will get 
safemodeexception instead), and also can guarantee that we do not change NN 
state while in safemode.

bq. For HA case, do we need to roll edit before logStartRollingUpgrade?
Now the Rolling_Upgrade_Start transaction is only used to notify the SBN to 
create the rollback checkpoint. Thus we can include the rolling upgrade info in 
the rollback image and do not need to roll edit before logStartRollingUpgrade.


> Avoid saving namespace when starting rolling upgrade
> ----------------------------------------------------
>
>                 Key: HDFS-6000
>                 URL: https://issues.apache.org/jira/browse/HDFS-6000
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ha, hdfs-client, namenode
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-6000.000.patch, HDFS-6000.001.patch, 
> HDFS-6000.002.patch, HDFS-6000.003.patch
>
>
> Currently when administrator sends the "rollingUpgrade start" command to the 
> active NN, the NN will trigger a checkpoint (the rollback fsimage). This will 
> cause NN not able to serve for a period of time.
> An alternative way is just to let the SBN do checkpoint, and rename the first 
> checkpoint after starting the rolling upgrade to rollback image. After the 
> rollback image is on both the ANN and the SBN, administrator can start 
> upgrading the software.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to