Re: Release Policy (was: Re: [Linux-HA] 2.1.2 and failover of colocated resources)

Peter Clapham Thu, 20 Sep 2007 01:09:23 -0700

auten taken (apart from the test
suite) that things won't break.

Just to make one thing clear - I would never ever upgrade to a newversion

before testing it extensively. on a test system.

What i do on a new release in my test environment:
I.) Test new version with clean system configuration (on all nodes):
- make cluster standby on all nodes
- cibadmin -E
- shutdown heartbeat on all nodes
- manually remove CIB file on all nodes
- upgrade to new release on all nodes
- start heartbeat on all nodes
- load the previous CIB configuration
- now test


This is quite good. Perhaps you could write it at linux-ha.org.


Very much so ! Please do :-)

After all my tests passed I'm at least sure the upgrade cycle hadno impact on
the cluster.
NOTE: If you have to change CIB to get the old cluster behaviouryour upgrade
process will be more painful.

II) Testing Upgrade Process:

1) You did not have to change the CIB:

a) back to old version:
- clean heartbeat (set all nodes standby, cibadmin -E, hertbeatstop, delete
CIB file on all nodes, remove new heartbeat version)
- install old heartbeat version on all nodes
- start heartbeat on all nodes
- load CIB

b) now perform this for all nodes (one after the other):
- set cluster to standby
- upgrade to new heartbeat
- activate node again

c) test again ---> if all tests passed "Hurray"
2.) Lests assume you had to change the CIB to get the same clusterbehaviour
like you had with the old heartbeat version.

Old CIB: cib-old.xml
New CIB: cib-new.xml
Goal: based on the input of the 2 CIBs you have to find a way toupgradeheartbeat with the shortes down time. This is not a trival taskbecause itreally depends on the CIB difference and is specific to yourconfiguration
what you can/should do and what not.
1.) standby-upgrade-activate process: this is the most secure wayto upgrade
but the cluster downtime may be various minutes
- set one node standby
- upgrade heartbeat on the standby node
- set all nodes to standby (downtime start as soon the last nodeis standby)
- load cib-new.xml
- activate node with new heartbeat version (here you have thecluster up
again)
- upgrade heartbeat on all other nodes

2.) set node/resources to unmanaged mode: this means you have a small
cib-delta-change and you really know what kind of effect thechange has
(ptest is your friend for that).
So far i always used version 1) because it seems to be morerobust. I tried a
couple times 2) but it sometimes ended that the heartbeat died on a
successive stop (but this may be fixed in newer heartbeat version).
I'd actually opt for this one and so far didn't experience
problems with it. Though I didn't do it so often. The big
advantage is that the resources don't have to be moved around.
Please post the logs, etc, in case Heartbeat gives up.

I've also found that on a busy heartbeat system /var/lib/pengineneeds a clean before performing an upgrade on all nodes... but asalways YMMV :-)


Thanks.

Dejan

kind regards Max




_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




Dr Peter Clapham, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK






--

The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________

Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Release Policy (was: Re: [Linux-HA] 2.1.2 and failover of colocated resources)

Reply via email to