Hi, >> Ok - so taking my cluster, erasing the cib with cibadmin -E, and >> rebooting both nodes. I've not got httpd starting by default on >> either machine, so when they come up, I will start httpd on one >> machine. Interestingly the result of cibadmin -E seems to have been >> that cibadmin -Q now times out, > > Shouldn't happen.
I'll try to reproduce that again. > >> so I've hacked around a bit deleting >> /var/lib/heartbeat/crm/cib.xml and trying to load it, by making the >> admin_epoch bigger than that which seemed to be there (though from >> where I know not). > > Fiddling with cib.xml is allowed only when heartbeat/CRM is not > running. Otherwise, and that's prefered, use the CRM tools > (crm_resource, cibadmin, etc). Sorry - I should have been clearer. I only made changes to an exported cib.xml, and imported them using cibadmin. >> I shouldn't be able to move the resource back to node2 - it still has >> a failure count > 0. >> >> However, it seems I can - using crm_resource -M -r httpd_2 -H node2 > > This inserts a -INFINITY location constraint... Ok... >> Ok - resetting the failcount to 0. The cluster should be in the same >> state it was before - let's try to kill apache. >> >> This time, apache seems to have restarted on node 2, and there was no >> failover. I don't understand this. The failcount has gone back up to >> 1, but the resource hasn't moved. > > ... which prevents it from even again starting on this node. > crm_resource should have printed a warning about it. I thought that was only if the -H some.host.name is omitted. > crm_resource -U removes the -INFINITY constraint, hence now the > cluster should start to behave as you expect it. I'll try that. >> So - what's going on - what have I got wrong? Also could someone >> please tell me the canonical way to reset the cluster, and import a >> new cib.xml? > > cibadmin -R -x cib.xml should do (perhaps cibadmin -E before, > can't recall anymore). It may happen that, if your old resource > names don't exist in the new configuration, there will be some > remnants in the status section of the CIB. Those can be removed > by crm_resource -C or by restarting heartbeat. Hrm - that's what I've been doing. > Or stop the cluster, remove cib.xml and cib.xml.sig on all nodes > (from /var/lib/heartbeat/crm), copy new cib.xml to all nodes, > start cluster. Use crm_verify to make sure that your cib.xml is > not broken. Will try this approach. S. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
