On Mar 21, 2007, at 12:55 PM, Alan Robertson wrote:
Andrew Beekhof wrote:
On 3/20/07, Alan Robertson <[EMAIL PROTECTED]> wrote:
Max Hofer wrote:
OK,
i lost a day just trying to figure out how to replace a cluster
node
with
a spare part. I just thought someone else needs this info or maybe
knows a better way as How I did.
Situation:
- cluster with 2 nodes (routing1, routing2)
- routing2 should be replaced with a spare part
- routing1 and routing2 use a file system on a drbd to share
common data
Precondition:
- routing2 crashed and hb_uuid is not recoverable
FYI: It's in the CIB, and also in the hb_uuid files on every
machine.
- spare part is configured to not start heartbeat after power-on
Steps I did:
* replaced crashed routing2 with spare part (cabling etc.)
* powered on routing2
* on routing2 invalidate data on drbd device (---> sync from
routing1
to routing2)
* on routing1 delete routing2 (I found a bug that pingd resets to 0
when calling hb_delnode ---> see bug #1535)
# /usr/lib/heartbeat/hb_delnode routing2 && killall pingd
(!!!NOTE: if your cluster configuration triggers a failover on a
pingd
failure set the cluster in unmanaged mode, stop pingd, delete
the node and then restart pingd, setting the cluster in managed
mode
again)
* on routing1 delete removed hostcache (I'm not sure if this setp
is
neccessary but someone in the mailing list explained it has to be
done)
# rm /var/lib/heartbeat/delhostcache
* on routing1 add routing2 again
# /usr/lib/heartbeat/hb_addnode routing2
* start heartbeat on routing2
Finished .....
What i really find stupid about the whole proccedure:
* the assumption the UUID file (/var/lib/heartbeat/hb_uuid)
should can
be used on the spare part is probably never the case (except you
perform a planned replacement ... )
See note above...
* this assumption does not work well if the spare part is
installed to
be a replacement for different cluster nodes. The UUDI is created
on the veiry first install of heartbeat (and thus is not part of my
configuration data). It would be a cofiguration hell to "save all
UUID of all clusters after cluster actvation" on a system with a
couple nodes
It's already saved for you - in two places on every machine...
What's missing is the conversion from ASCII to binary. Could you
make a
bugzilla for that and assign it to me?
been there done that:
crm_uuid -w
Andrew: Is there a man page or other documentation outside the
command
for this?
it will be in the set novell is making available to us
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems