On Tue, Jan 26, 2016 at 11:33 AM, Matteo Bertozzi <[email protected]>
wrote:

> On Tue, Jan 26, 2016 at 11:15 AM, Enis Söztutar <[email protected]> wrote:
>
> > Yep, my feeling is that we should make this rolling upgradable. Having
> > coordination for master to be upgraded first then region servers is fine,
> > but may end up to be problematic in practice. I am not sure how we can
> > enforce this (new regionservers cannot talk to old masters for example,
> or
> > cannot come up due to fs changes?).
> >
>
> I think the upgrade deploy order is crucial here, to be able to have
> rolling upgrades.
> the trick is to rely on the master as the "migration coordinator" since it
> is the only one aware of the full cluster.
> but is basically what we do now by hand if we want to enable new features
> which is more or less like a double rolling restart.
>  - upgrade one of the master (nothing changes in format or rpc calls)
>  - upgrade the backup masters (nothing changes in format or rpc calls)
>  - start upgrading the RSs (nothing changes in format or rpc calls)
>  - all the RSs are on the new version
>  - we can start triggering migrations as if it is simple region moves from
> the balancer
>

For things like the new assignment manager for example, I think we can make
a totally different and non-compatible implementation using a technique
like you suggest.

We can have the new master come up, at which time, it will look for its own
version and the percentage of regionservers having a version at most his
own. It will wait until all of the regionservers reported are upgraded. If
all of the reported regionservers are upgraded, the master then switches to
the new implementation, and starts rejecting regionservers to join the
cluster with an older version. This is also persisted in zk, so that it
happens once.



>
>
> > > > if we have replication working between 1.x and 2.x
> > >
> > > This is required I think. Stranded users on 0.94 went so far as to hack
> > new
> > > replication endpoints to get it working between 0.94 and 0.96+.
> > >
> >
> > +1. The RPC wire format is not changing in 2.0 is it?
> >
>
> yeah, there should be no wire changes. new methods added but as usual.
> and I'm pretty sure replication-v2 was described as something that run
> parallel to v1 (or something like that)
>
>
> > > > is it acceptable to force people to move to the latest 1.x (e.g.
> > 1.5.x)?
> > >
> > > If this would be the only way to upgrade without downtime, then yes,
> we'd
> > > take it.
> > >
> >
> > Agreed. It is acceptable to require 1.x latest. It should be a last
> resort
> > though.
> >
> > 0.98 -> 2.0 will depend on whether latest 1.x is required or not I think.
>
>
> The 1.x changes should be mainly to be able to have less compact code in
> 2.0
> we can probably do without (still looking into that), but one thing for
> sure is that
> the changes are like new rpc calls and similar, so they can be easily added
> to a 1.1.y or even 0.98.y
> if we don't want to force people to go to the latest 1.x
>

Reply via email to