> On 21 Jan 2015, at 3:38 am, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> 
> wrote:
> 
> Hi!
> 
> When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding 
> update), a node with up-to-date software showed these messages (I feel these 
> should not appear):
> 
> Jan 20 17:12:38 h10 corosync[13220]:  [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
> share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
> share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
> share the same cluster nodeid: 739512321

We fixed this upstream a little while back.
I think the fix even came from suse.

> 
> ### So why may not the same nodes have the same nodeid?
> 
> Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: 
> crm_find_peer: Node 84939948/h05 = 0x61ae90 - 
> b6cabbb3-8332-4903-85be-0c06272755ac
> Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: 
> crm_find_peer: Node 17831084/h01 = 0x61e300 - 
> 11693f38-8125-45f2-b397-86136d5894a4
> Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: 
> crm_find_peer: Node 739512330/h10 = 0x614400 - 
> 302e33d8-7cee-4f3b-97da-b38f0d51b0f6
> 
> ### above are the three nodes of the cluster
> 
> Jan 20 17:12:38 h10 attrd[13260]:     crit: crm_find_peer: Node 739512321 and 
> 17831084 share the same name 'h01'
> 
> ### Now there are different nodeids it seems...
> 
> Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_find_peer: Node 'h01' and 
> 'h01' share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
> share the same cluster nodeid: 739512321
> 
> ### The same again...
> 
> (pacemaker-1.1.11-0.7.53, corosync-1.4.7-0.19.6)
> 
> As a result the node h01 is offline now. Before updating the software the 
> node was member of the cluster.
> 
> On node h01 I see messages like these:
> cib[7439]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 84939948
> cib[7439]:   notice: crm_update_peer_state: plugin_handle_membership: Node 
> (null)[84939948] - state is now member (was (null))
> cib[7439]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 739512330
> cib[7439]:   notice: crm_update_peer_state: plugin_handle_membership: Node 
> (null)[739512330] - state is now member (was (null))
> crmd[7444]:  warning: crmd_cs_dispatch: Receiving messages from a node we 
> think is dead: rksaph05[84939948]
> crmd[7444]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 739512330
> corosync[7402]:  [MAIN  ] Completed service synchronization, ready to provide 
> service.
> crmd[7444]:   notice: do_state_transition: State transition S_PENDING -> 
> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE 
> origin=do_cl_join_finalize_respond ]
> 
> An attempt to restart openais did hang with this messages:
> attrd[7442]:   notice: attrd_perform_update: Sent update 7: 
> shutdown=1421771193
> corosync[7402]:  [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd 
> (pid=7444, seq=6) to terminate...
> [message repeats]
> 
> So I killed crmd (pid 7444)m and openais shut down.
> Unfortunately the problem still persists...
> 
> Regards,
> Ulrich
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to