[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

Aaron T. Myers (JIRA) Wed, 15 Jan 2014 14:23:18 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872690#comment-13872690
 ]


Aaron T. Myers commented on HDFS-5138:
--------------------------------------

bq. In documentation you say " [[2]] Both NNs must be started with the 
<<<'-upgrade'>>> flag." Does this mean both the namenodes should be available 
during upgrade or does it just mean that namenodes must be started with 
-upgrade. One of the namenode can first upgrade (and possibly be finalized) and 
later second NN can be upgraded?

Closer to the latter. OneNN can first upgrade, and even do the upgrade of the 
shared log, and later the second NN can be started with the -upgrade flag. It 
will see that an upgrade is in progress by presence of the shared log lock  and 
do its local upgrade with that CTime. One cannot, however, start the second NN 
with the -upgrade flag after the upgrade has been finalized, since doing so 
removes the shared log lock.

bq. When active namenode is performing shared edits upgrade, if it fails, does 
fail over occur to the standby and does the new active resume the upgrade? Same 
question for finalize and rollback.

i.e. if it fails to upgrade the shared log? That NN woud shut down and when the 
other NN became active (either manually or automatically) yes, it would try to 
upgrade the shared log at that time. Finalization - no, failure of that 
procedure would require the admin to re-attempt the finalization once the 
system was back up, and finalization requires both NNs to be running.

bq. In documentation "The operator should run the roll back command on one of 
the NN boxes,...", could have issues related to which NN is chosen. It must be 
on the one where upgrade has been previously done right?

Well, I had been assuming that both NNs had already been upgraded, in which 
case no, it doesn't matter which NN does the rollback. If the NN you tried to 
run rollback from had not in fact already been upgraded then it won't let you 
start with the '-rollback' option.

bq. Given the rollback procedure, where bootstrapStandby muste be done on one 
of the NNs, why not just upgrade a single namenode (without worrying about two 
namenodes racing to upgrade etc.) and just follow the same procedure as 
rollback to simplify this?

That would certainly simplify the code quite a bit, since we could just assume 
that only one NN is running during the actual upgrade procedure, and I 
considered this option. Doing so means that there'd be some asymmetry between 
the two nodes involved in the whole HA upgrade procedure, e.g. you would then 
_have_ to do the rollback on the NN where you initiated the upgrade, but 
perhaps that's acceptable since layout version upgrades are relatively rare. If 
you'd be more comfortable with this approach then I can think about what it 
would take to rework the patch.

bq. Another thing that comes mind is, the lock files are created on JNs. What 
if lock file was created on all but was deleted only on two. How does the 
presence of lock file on a JN affect the system?

In that case the finalization would fail and would need to be re-attempted.

bq. FSNamesystem.java. I see that IDEs expand a.b.c.* imports to individual 
imports. You are changing it back to a.b.c.* in your patch.

Yea, that's where I recall resolving import conflicts. I'll take a look and fix 
those once I hear back from you on the above.

> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

Reply via email to