[Linux-HA] Mysql multiple slaves, slaves restarting occasionally without a reason

Attila Megyeri Thu, 12 Sep 2013 00:45:56 -0700

Hi,

We have a Mysql cluster which works fine when I have a single master ("A") and 
slave ("B"). Failover is almost immediate and I am happy with this approach.
When we configured two additional slaves, strange things start to happen. From 
time to time I am noticing that all slaves mysql instances are restarted and I 
cannot figure out why.


I tried to find out what is happening, and this is how far I got:

There is a repeating sequence in the DC, which looks like this when everything 
is fine:

Sep 10 01:45:42 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Sep 10 01:45:42 oamgr crmd: [3385]: info: do_te_invoke: Processing graph 71358 
(ref=pe_calc-dc-1378777542-165977) derived from 
/var/lib/pengine/pe-input-3179.bz2
Sep 10 01:45:42 oamgr crmd: [3385]: notice: run_graph: ==== Transition 71358 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-3179.bz2): Complete
Sep 10 01:45:42 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Sep 10 01:47:42 oamgr crmd: [3385]: info: crm_timer_popped: PEngine Recheck 
Timer (I_PE_CALC) just popped (120000ms)
Sep 10 01:47:42 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Sep 10 01:47:42 oamgr crmd: [3385]: info: do_state_transition: Progressed to 
state S_POLICY_ENGINE after C_TIMER_POPPED
....

But

It looks somewhat different when I see the restarts:

....
Sep 10 01:51:42 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Sep 10 01:51:42 oamgr crmd: [3385]: info: do_te_invoke: Processing graph 71361 
(ref=pe_calc-dc-1378777902-165980) derived from 
/var/lib/pengine/pe-input-3179.bz2
Sep 10 01:51:42 oamgr crmd: [3385]: notice: run_graph: ==== Transition 71361 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-3179.bz2): Complete
Sep 10 01:51:42 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Sep 10 01:52:45 oamgr crmd: [3385]: info: abort_transition_graph: 
te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, 
id=status-oadb2-master-db-mysql.1, name=master-db-mysql:1, value=0, magic=NA, 
cib=0.4829.3480) : Transient attribute: update
Sep 10 01:52:45 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 10 01:52:45 oamgr crmd: [3385]: info: abort_transition_graph: 
te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, 
id=status-oadb2-readable, name=readable, value=0, magic=NA, cib=0.4829.3481) : 
Transient attribute: update
.....

There is a transaction abort, and shortly after this, the slaves are restarted:


....
Sep 10 01:52:45 oamgr pengine: [3384]: notice: LogActions: Move    db-mysql:1   
(Slave oadb2 -> huoadb1)
Sep 10 01:52:45 oamgr pengine: [3384]: notice: LogActions: Move    db-mysql:2   
(Slave huoadb1 -> oadb2)
Sep 10 01:52:45 oamgr crmd: [3385]: notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Sep 10 01:52:45 oamgr crmd: [3385]: info: do_te_invoke: Processing graph 71362 
(ref=pe_calc-dc-1378777965-165981) derived from 
/var/lib/pengine/pe-input-3180.bz2
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 
148: notify db-mysql:0_pre_notify_stop_0 on oadb1
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 
150: notify db-mysql:1_pre_notify_stop_0 on oadb2
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 
151: notify db-mysql:2_pre_notify_stop_0 on huoadb1
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 
152: notify db-mysql:3_pre_notify_stop_0 on huoadb2
Sep 10 01:52:45 oamgr pengine: [3384]: notice: process_pe_message: Transition 
71362: PEngine Input stored in: /var/lib/pengine/pe-input-3180.bz2
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 39: 
stop db-mysql:1_stop_0 on oadb2
Sep 10 01:52:45 oamgr crmd: [3385]: info: te_rsc_command: Initiating action 43: 
stop db-mysql:2_stop_0 on huoadb1
....

It appears that oadb2 and huoadb1 are replaced with each other (in terms of 
db-mysql:1 and db-mysql:2 )? Does that make any sense?

It happens only when I have all 4 mysql nodes online. (oadb1, oadb2, huoadb1, 
huoadb2). When I moved oadb2 to standby for a day, I did not see restarts.

Could someone help me troubleshoot this?


Mysql version is 5.1.66
Pacemaker 1.1.7
Corosync 1.4.2
Mysql RA is the latest from github


Thanks in advance,

Attila



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Mysql multiple slaves, slaves restarting occasionally without a reason

Reply via email to