Dejan Muhamedagic írta:
Hi,
On Thu, Feb 21, 2008 at 06:40:57PM +0100, Zoltan Boszormenyi wrote:
Zoltan Boszormenyi ?rta:
Hi,
we have a problem with automatic IPaddr failback on a system.
There are two nodes, IPaddr is preferred running on the "master" node.
Static score for that is 20. Resource stickiness for IPaddr is 40.
Pingd is set up the same way the documentation mentions, ha.cf has this:
respawn root /usr/lib64/heartbeat/pingd -m 100 -d 5s
Also, the node that loses the network connection to the ping node
gives up its IPaddr, again from the docs:
<rule id="virt_ip_connected" score_attribute="pingd">
<expression id="virt_ip_connected_defined" attribute="pingd"
operation="defined"/>
</rule>
<rule id="virt_ip_unconnected" score="-INFINITY" boolean_op="or">
<expression id="virt_ip_unconnected_undefined" attribute="pingd"
operation="not_defined"/>
<expression id="virt_ip_unconnected_zero" attribute="pingd"
operation="lte" value="0"/>
</rule>
This would _should_ mean the following scoring matrix and transition flow:
master
slave
static stickiness pingd static
stickiness pingd
IPaddr not running 20 0 100 0 0
100
decision is to run IPaddr on master
IPaddr running on master 20 40 100 0 0
100
master loses connection
IPaddr running on master 20 40 0 0 0
100
IPaddr migrated to slave
IPaddr running on master 20 0 0 0 40
100
master restores connection
IPaddr running on master 20 0 100 0 40
100
So, at this point, master has 120 points, slave has 140 points.
So, it should stay on the slave. But it doesn't stay, it's migrated
back to master. With trial-and-error, I raised resource_stickiness
to 200 and now it's staying on the slave.
This question still stands. Why doesn't it work with
resource_stickiness=40?
Is my theory wrong? Is the scoring system works differently?
There's a script somebody posted on the list a few times which
calculates scores from the pe input files (the transition
graphs). Just found it here:
http://hg.clusterlabs.org/pacemaker/dev/raw-file/tip/contrib/showscores.sh
The pe inputs are in /var/lib/heartbeat/pengine. This way you can
watch how they change between transitions. In particular, there's
a gotcha with groups, i.e. in order for a group to move, you'd
need to add scores for all resources from the group. Otherwise,
not an expert with scores, so can't give you a more specific
advice.
Thanks,
Dejan
Thanks for the link, I get this as the result.
ws247:~/ha # ./showscores.sh
Warning: Script running not on DC. Might be slow(!)
Score Resource Node Stickiness
Failcount Failure-Stickiness
pgsql_master 0 ws247
0
pgsql_master -INFINITY ws237
0
pgsql_slave 0
ws237
pgsql_slave -INFINITY ws247
0
replay 0 ws247
0
replay -INFINITY
ws237
slavemigration INFINITY
ws237
slavemigration -INFINITY ws247
0
ticker -INFINITY
ws237
ticker -INFINITY
ws247
virt_ip 120 ws247 200
0
virt_ip 300 ws237 200
0
So, my calculation seems correct:
- 100 points for the pingd attribute + 20 static preference on the master
- 100 points for the pingd attribute + 200 for the stickiness on the slave
Going back to resource_stickiness=40 show the same values I calculated.
And to my surprise, disconnecting the master from the network and
later reconnecting doesn't make the virt_ip fail back. It stays on the slave
where it should. What's the terms for such bugs that disappear when
looked at? "Heisenbug"? :-)
Well, it may be that the system had manual migration constraints when
I tested with resource_stickiness=40.
But unfortunately only
on my testing setup. On the real machines IPaddr is migrated back
to the slave at both resource_stickiness values.
This detail above was solved. On the production system IPaddr
was migrated forcibly to the master once and the constraint that
was automatically created by the migration wasn't deleted yet.
Sorry for the noise.
The machines are running SLES10 SP1, heartbeat package is 2.1.3-0.6
coming from SuSE/Novell. It's a preview package from SLES10 SP2.
Can someone explain it to me?
Best regards,
--
----------------------------------
Zolt?n B?sz?rm?nyi
Cybertec Sch?nig & Sch?nig GmbH
http://www.postgresql.at/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems