Re: About Geode rolling downgrade

Nabarun Nag Fri, 05 Jun 2020 02:01:30 -0700

Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of 
days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the 
downgrade. Like create a DS with old versions and let it do a GII from the 
newer version cluster and then shutdown the new version DS. Now we have a DS 
with lower version.

Regards
Naba

________________________________
From: Mario Ivanac <mario.iva...@est.tech>
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode <dev@geode.apache.org>
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario
________________________________
Šalje: Alberto Gomez <alberto.go...@est.tech>
Poslano: 14. svibnja 2020. 14:45
Prima: geode <dev@geode.apache.org>
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.
________________________________
From: Alberto Gomez <alberto.go...@est.tech>
Sent: Thursday, May 7, 2020 10:44 AM
To: geode <dev@geode.apache.org>
Subject: Re: About Geode rolling downgrade

Hi again,

Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.

The procedure we came up with is:

  1.  First step - downgrade locators:

     *   While still on the newer version, export cluster configuration.
     *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
     *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

     *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
     *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
     *   Now per server:

                                                               i.      Shutdown 
it, revoke its disk-stores and delete its file system.

                                                             ii.      Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).

The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.

Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.

What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.

Best regards,

-Alberto G.

________________________________
From: Anilkumar Gingade <aging...@pivotal.io>
Sent: Thursday, April 23, 2020 12:59 AM
To: geode <dev@geode.apache.org>
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.

On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker <aba...@pivotal.io> wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade <aging...@pivotal.io>
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>

Re: About Geode rolling downgrade

Reply via email to