On Wed, Jun 4, 2008 at 10:24 AM, Geoffroy ARNOUD <[EMAIL PROTECTED]> wrote: > Hi all, > > We are setting up a MySQL HA cluster, with Heart/Pacemaker and DRBD. > Heartbeat is configured has follows: > - DRBD as a master/slave resource > - MySQL as a resource-group with the following primitives: > * a Virtual IP address (IPAddr2) > * a File system > * MySQL > - 2 constraints between DRBD and the resource group (rsc_order and > rsc_colocation) > > We have some troubles to figure out how scores are computed.
[snip] > With other values of stickiness, it is possible to fall in a case > where the lysql database has a negative score for the master node, but > the master score of DRBD is greater than the slave node. Therefore, > heartbeat refuses to restart the database, but won't migrate the > resources [snip] For 0.6.4 (in response to a bug report), we changed how colocation worked. This change had the seemingly positive effect that collocating with another resource didn't set the score to INFINITY - you could still see the score the resource had for the node. It also meant that positive and negative colocation constraints behaved consistently. Unfortunately this change exposed some problems in how those scores were calculated and it was possible to create situations (as you saw) where the group couldn't be fully recovered on the current node but also wouldn't move to a new one. I spent much time with Dominik over the last few days and I think we finally have it in acceptable shape. Basically N ::= Number of resources in the group M ::= Maximum number of failures by any member of the group Assuming no rsc_location constraints, the group will migrate when (N * stickiness) < (M * failure_stickiness) I sincerely hope this is the last scoring change we need to make for 0.6. In 1.0 things get considerably better as we've dropped failure_stickiness and instead introduced a migration-threshold. You set migration-threshold=X and until a resource fails X times, the scores don't change. After the Xth failure, the node gets -INFINITY and thats the end of story. So at this point, you can: * wait for 0.6.5 in a week or two * continue using 0.6.4 until 0.6.5 comes out * build your own copy of 0.6.4+ from http://hg.clusterlabs.org/pacemaker/stable-0.6/archive/tip.tar.bz2 * install 0.6.3 This is a serious bug - services that could have been kept active may instead remain stopped. But please keep in mind that this bug will not cause data corruption and the services must have already suffered multiple failures before it can be triggered. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
