> On 21 Jan 2015, at 3:38 am, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> > wrote: > > Hi! > > When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding > update), a node with up-to-date software showed these messages (I feel these > should not appear): > > Jan 20 17:12:38 h10 corosync[13220]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' > share the same cluster nodeid: 739512321 > Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' > share the same cluster nodeid: 739512321 > Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' > share the same cluster nodeid: 739512321
We fixed this upstream a little while back. I think the fix even came from suse. > > ### So why may not the same nodes have the same nodeid? > > Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: > crm_find_peer: Node 84939948/h05 = 0x61ae90 - > b6cabbb3-8332-4903-85be-0c06272755ac > Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: > crm_find_peer: Node 17831084/h01 = 0x61e300 - > 11693f38-8125-45f2-b397-86136d5894a4 > Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: > crm_find_peer: Node 739512330/h10 = 0x614400 - > 302e33d8-7cee-4f3b-97da-b38f0d51b0f6 > > ### above are the three nodes of the cluster > > Jan 20 17:12:38 h10 attrd[13260]: crit: crm_find_peer: Node 739512321 and > 17831084 share the same name 'h01' > > ### Now there are different nodeids it seems... > > Jan 20 17:12:38 h10 attrd[13260]: warning: crm_find_peer: Node 'h01' and > 'h01' share the same cluster nodeid: 739512321 > Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' > share the same cluster nodeid: 739512321 > > ### The same again... > > (pacemaker-1.1.11-0.7.53, corosync-1.4.7-0.19.6) > > As a result the node h01 is offline now. Before updating the software the > node was member of the cluster. > > On node h01 I see messages like these: > cib[7439]: notice: get_node_name: Could not obtain a node name for classic > openais (with plugin) nodeid 84939948 > cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node > (null)[84939948] - state is now member (was (null)) > cib[7439]: notice: get_node_name: Could not obtain a node name for classic > openais (with plugin) nodeid 739512330 > cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node > (null)[739512330] - state is now member (was (null)) > crmd[7444]: warning: crmd_cs_dispatch: Receiving messages from a node we > think is dead: rksaph05[84939948] > crmd[7444]: notice: get_node_name: Could not obtain a node name for classic > openais (with plugin) nodeid 739512330 > corosync[7402]: [MAIN ] Completed service synchronization, ready to provide > service. > crmd[7444]: notice: do_state_transition: State transition S_PENDING -> > S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE > origin=do_cl_join_finalize_respond ] > > An attempt to restart openais did hang with this messages: > attrd[7442]: notice: attrd_perform_update: Sent update 7: > shutdown=1421771193 > corosync[7402]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd > (pid=7444, seq=6) to terminate... > [message repeats] > > So I killed crmd (pid 7444)m and openais shut down. > Unfortunately the problem still persists... > > Regards, > Ulrich > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems