On Mar 21, 2007, at 12:55 PM, Alan Robertson wrote:

Andrew Beekhof wrote:
On 3/20/07, Alan Robertson <[EMAIL PROTECTED]> wrote:

Max Hofer wrote:


OK,





i lost a day just trying to figure out how to replace a cluster node

with


a spare part. I just thought someone else needs this info or maybe


knows a better way as How I did.





Situation:


- cluster with 2 nodes (routing1, routing2)


- routing2 should be replaced with a spare part


- routing1 and routing2 use a file system on a drbd to share


 common data





Precondition:


- routing2 crashed and hb_uuid is not recoverable



FYI: It's in the CIB, and also in the hb_uuid files on every machine.




- spare part is configured to not start heartbeat after power-on





Steps I did:


* replaced crashed routing2 with spare part (cabling etc.)


* powered on routing2


* on routing2 invalidate data on drbd device (---> sync from routing1


to routing2)


* on routing1 delete routing2 (I found a bug that pingd resets to 0


when calling hb_delnode ---> see bug #1535)


# /usr/lib/heartbeat/hb_delnode routing2 && killall pingd


(!!!NOTE: if your cluster configuration triggers a failover on a pingd


failure set the cluster in unmanaged mode, stop pingd, delete


the node and then restart pingd, setting the cluster in managed mode


again)


* on routing1 delete removed hostcache (I'm not sure if this setp is


neccessary but someone in the mailing list explained it has to be done)


# rm /var/lib/heartbeat/delhostcache


* on routing1 add routing2 again


# /usr/lib/heartbeat/hb_addnode routing2


* start heartbeat on routing2





Finished .....





What i really find stupid about the whole proccedure:


* the assumption the UUID file (/var/lib/heartbeat/hb_uuid) should can


be used on the spare part is probably never the case (except you


perform a planned replacement ... )



See note above...




* this assumption does not work well if the spare part is installed to


be a replacement for different cluster nodes. The UUDI is created


on the veiry first install of heartbeat (and thus is not part of my


configuration data). It would be a cofiguration hell to "save all


UUID of all clusters after cluster actvation" on a system with a


couple nodes



It's already saved for you - in two places on every machine...



What's missing is the conversion from ASCII to binary. Could you make a

bugzilla for that and assign it to me?



been there done that:
 crm_uuid -w

Andrew: Is there a man page or other documentation outside the command
for this?

it will be in the set novell is making available to us
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to