Hi list,

I'm experiencing a strange problem an I can't figure out what's wrong. I'm running openSUSE 12.3 on a 2-node-cluster with pacemaker-1.1.9-55.2.x86_64.

Every 15 minutes the following messages are logged. If stonith is enabled, my hosts get fenced afterwards. The messages appeared on both nodes until I temporarily activated stonith. After some fencing, now only the second node "s00202" keeps logging these (both nodes online again):

Aug 27 10:21:13 s00202 crmd[3390]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Aug 27 10:21:13 s00202 pengine[3389]: notice: unpack_config: On loss of CCM Quorum: Ignore Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_vmail and ms_drbd_vmail are both allocated but to different nodes: s00202 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_www and ms_drbd_www are both allocated but to different nodes: s00201 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_www and ms_drbd_www are both allocated but to different nodes: s00201 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_mysql and ms_drbd_mysql are both allocated but to different nodes: s00201 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_redis and ms_drbd_redis are both allocated but to different nodes: s00201 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_bind and ms_drbd_bind are both allocated but to different nodes: s00202 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_squid and ms_drbd_squid are both allocated but to different nodes: s00202 vs. n/a Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_www and ms_drbd_www are both allocated but to different nodes: s00201 vs. n/a
Aug 27 10:21:13 s00202 pengine[3389]: last message repeated 3 times
Aug 27 10:21:13 s00202 crmd[3390]: notice: do_te_invoke: Processing graph 61 (ref=pe_calc-dc-1377591673-1345) derived from /var/lib/pacemaker/pengine/pe-input-53.bz2 Aug 27 10:21:13 s00202 crmd[3390]: notice: run_graph: Transition 61 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-53.bz2): Complete Aug 27 10:21:13 s00202 crmd[3390]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Aug 27 10:21:13 s00202 pengine[3389]: notice: process_pe_message: Calculated Transition 61: /var/lib/pacemaker/pengine/pe-input-53.bz2


I believe that the errors about filter_colocation_contraint all have the same cause, so I'm going to reduce the problem to the following:

Aug 27 10:21:13 s00202 pengine[3389]: error: filter_colocation_constraint: pri_fs_vmail and ms_drbd_vmail are both allocated but to different nodes: s00202 vs. n/a


This is my configuration (relevant extract):

# crm configure show

node s00201
node s00202
primitive pri_drbd_vmail ocf:linbit:drbd \
operations $id="pri_drbd_vmail-operations" \
op monitor interval="20" role="Slave" timeout="20" \
op monitor interval="10" role="Master" timeout="20" \
params drbd_resource="vmail"
primitive pri_fs_vmail ocf:heartbeat:Filesystem \
params device="/dev/drbd5" fstype="ext4" directory="/var/vmail" \
op monitor interval="30"
primitive pri_ip_vmail ocf:heartbeat:IPaddr2 \
operations $id="pri_ip_vmail-operations" \
op monitor interval="10s" timeout="20s" \
params ip="10.0.1.105" nic="br0"
primitive pri_svc_postfix ocf:heartbeat:postfix \
operations $id="pri_svc_postfix-operations" \
op monitor interval="60s" timeout="20s" \
params config_dir="/etc/postfix_vmail"
group grp_vmail pri_fs_vmail pri_ip_vmail pri_svc_postfix \
meta target-role="Started"
ms ms_drbd_vmail pri_drbd_vmail \
meta notify="true" target-role="Started" master-max="1" is-managed="true" colocation col_grp_vmail_ON_drbd_vmail inf: grp_vmail:Started ms_drbd_vmail:Master order ord_ms_drbd_vmail_BEFORE_grp_vmail inf: ms_drbd_vmail:promote grp_vmail:start
property $id="cib-bootstrap-options" \
no-quorum-policy="ignore" \
placement-strategy="balanced" \
dc-version="1.1.9-2db99f1" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
last-lrm-refresh="1377585341" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="3"
op_defaults $id="op-options" \
timeout="600" \
record-pending="false"


# crm_resource -L

Master/Slave Set: ms_drbd_vmail [pri_drbd_vmail]
Masters: [ s00202 ]
Slaves: [ s00201 ]
Resource Group: grp_vmail
pri_fs_vmail       (ocf::heartbeat:Filesystem):    Started
pri_ip_vmail       (ocf::heartbeat:IPaddr2):       Started
pri_svc_postfix    (ocf::heartbeat:postfix):       Started


# ptest -Ls | egrep "pri_fs_vmail|ms_drbd_vmail|grp_vmail"

group_color: grp_vmail allocation score on s00201: 0
group_color: grp_vmail allocation score on s00202: 0
group_color: pri_fs_vmail allocation score on s00201: 0
group_color: pri_fs_vmail allocation score on s00202: 100
clone_color: ms_drbd_vmail allocation score on s00201: 0
clone_color: ms_drbd_vmail allocation score on s00202: 600
native_color: pri_fs_vmail allocation score on s00201: -INFINITY
native_color: pri_fs_vmail allocation score on s00202: 10700


Besides these error messages, both hosts are working fine (stonith disabled). Starting/stopping or migrating resources is possible without problems, too.

I have no idea what's wrong and Google didn't help me in this case.
Maybe someone here is able to help me out?


Thank you.

Regards,
Thomas
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to