[
https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238562#comment-13238562
]
Andrew Purtell commented on HADOOP-8209:
----------------------------------------
+1
We use this practice.
This makes it possible to do a rolling restart of DataNodes without taking down
service by bouncing the NameNode. This is most useful when the change scope is
restricted to the DN. If HA is backported to branch-1 we could handle most NN
changes similarly: Upgrade the NNs one at a time with manual failover for no
downtime. One issue remaining is that modification of a NN<->DN interface
method requires a kludgy migration over three updates.
It is also possible to do this with TaskTrackers, but this will fail currently
running tasks on the TT. Even so we can still stage in a TT bugfix release,
just more slowly. Bouncing the JobTracker remains a big deal, but the
maintenance window for that becomes very short if everything else has been
rolled out ahead of time. With some "HA JT" option for branch-1 (Corona?) this
might also have no downtime.
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
> Key: HADOOP-8209
> URL: https://issues.apache.org/jira/browse/HADOOP-8209
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 1.0.0
> Reporter: Eli Collins
> Assignee: Eli Collins
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie
> svn revision) do not match. TTs refuse to connect to JTs if their build
> *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in
> HADOOP-5203. The primary motivation in that jira was (1) it's difficult to
> guarantee every build on a large cluster got deployed correctly, builds don't
> get rolled back to old versions by accident etc, and (2) mixed versions can
> lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible,
> eg when deploying a new build which contains the same contents as the
> previous one, plus a critical security patch that does not affect
> compatibility. Currently deploying a 1 line patch requires taking down the
> entire cluster (or trying to work around the issue by lying about the build
> revision or checksum, yuck). These users would like to be able to perform a
> rolling upgrade.
> In order to support this, let's add an option that is off by default, but,
> when enabled, makes the DN and TT version check just check for an exact
> version match (eg "1.0.2") but ignore the build revision (DN) and the source
> checksum (TT). Two builds still need to match the major, minor, and point
> numbers, but nothing else.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira