Re: Release Policy (was: Re: [Linux-HA] 2.1.2 and failover of colocated resources)

Max Hofer Wed, 19 Sep 2007 09:43:38 -0700

On Wednesday 19 September 2007, Raoul Bhatia [IPAX] wrote:
> hi,
>
> Andrew Beekhof wrote:
>  > shortly I'll be releasing a revised implementation (including
>  > documentation!) of colocation which will make it much more intuitive
>  > and remove the need for hacks like symmetrical=true
>  >
>  > if anyone wants to try it sooner rather than later, grab the latest
>  > from http://hg.linux-ha.org/dev and ping me for the current version of
>  > the docs.
>
> i recently joined this mailinglist and saw that there is a lot of
> activity going on. after reading the previous statement, i wonder
> what the release policy for linux-ha is like.
>
> will 2.1.3 include the new colocation implementation or will this become
> available in 2.2.0? will i have to carefully test each "minor" release
> (e.g. 2.1.3, 2.1.4, etc.) or is precauten taken (apart from the test
> suite) that things won't break.
>
Just to make one thing clear - I would never ever upgrade to a new version 
before testing it extensively. on a test system.


What i do on a new release in my test environment:
I.) Test new version with clean system configuration (on all nodes):
- make cluster standby on all nodes
- cibadmin -E
- shutdown heartbeat on all nodes
- manually remove CIB file on all nodes
- upgrade to new release on all nodes
- start heartbeat on all nodes
- load the previous CIB configuration
- now test

After all my tests passed I'm at least sure the upgrade cycle had no impact on 
the cluster.

NOTE: If you have to change CIB to get the old cluster behaviour your upgrade 
process will be more painful.

II) Testing Upgrade Process:

1) You did not have to change the CIB:

a) back to old version:
- clean heartbeat (set all nodes standby, cibadmin -E, hertbeat stop, delete 
CIB file on all nodes, remove new heartbeat version)
- install old heartbeat version on all nodes
- start heartbeat on all nodes
- load CIB

b) now perform this for all nodes (one after the other):
- set cluster to standby 
- upgrade to new heartbeat
- activate node again

c) test again ---> if all tests passed "Hurray"

2.) Lests assume you had to change the CIB to get the same cluster behaviour 
like you had with the old heartbeat version.

Old CIB: cib-old.xml
New CIB: cib-new.xml

Goal: based on the input of the 2 CIBs you have to find a way to upgrade 
heartbeat with the shortes down time. This is not a trival task because it 
really depends on the CIB difference and is specific to your configuration 
what you can/should do and what not.

1.) standby-upgrade-activate process: this is the most secure way to upgrade 
but the cluster downtime may be various minutes 
- set one node standby
- upgrade heartbeat on the standby node
- set all nodes to standby (downtime start as soon the last node is standby)
- load cib-new.xml 
- activate node with new heartbeat version (here you have the cluster up 
again)
- upgrade heartbeat on all other nodes

2.) set node/resources to unmanaged mode: this means you have a small 
cib-delta-change and you really know what kind of effect the change has 
(ptest is your friend for that). 

So far i always used version 1) because it seems to be more robust. I tried a 
couple times 2) but it sometimes ended that the heartbeat died on a 
successive stop (but this may be fixed in newer heartbeat version).

kind regards Max




_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Release Policy (was: Re: [Linux-HA] 2.1.2 and failover of colocated resources)

Reply via email to