Hi!

When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding 
update), a node with up-to-date software showed these messages (I feel these 
should not appear):

Jan 20 17:12:38 h10 corosync[13220]:  [MAIN  ] Completed service 
synchronization, ready to provide service.
Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
share the same cluster nodeid: 739512321

### So why may not the same nodes have the same nodeid?

Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: crm_find_peer: 
Node 84939948/h05 = 0x61ae90 - b6cabbb3-8332-4903-85be-0c06272755ac
Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: crm_find_peer: 
Node 17831084/h01 = 0x61e300 - 11693f38-8125-45f2-b397-86136d5894a4
Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_dump_peer_hash: crm_find_peer: 
Node 739512330/h10 = 0x614400 - 302e33d8-7cee-4f3b-97da-b38f0d51b0f6

### above are the three nodes of the cluster

Jan 20 17:12:38 h10 attrd[13260]:     crit: crm_find_peer: Node 739512321 and 
17831084 share the same name 'h01'

### Now there are different nodeids it seems...

Jan 20 17:12:38 h10 attrd[13260]:  warning: crm_find_peer: Node 'h01' and 'h01' 
share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]:  warning: crm_find_peer: Node 'h01' and 'h01' 
share the same cluster nodeid: 739512321

### The same again...

(pacemaker-1.1.11-0.7.53, corosync-1.4.7-0.19.6)

As a result the node h01 is offline now. Before updating the software the node 
was member of the cluster.

On node h01 I see messages like these:
cib[7439]:   notice: get_node_name: Could not obtain a node name for classic 
openais (with plugin) nodeid 84939948
cib[7439]:   notice: crm_update_peer_state: plugin_handle_membership: Node 
(null)[84939948] - state is now member (was (null))
cib[7439]:   notice: get_node_name: Could not obtain a node name for classic 
openais (with plugin) nodeid 739512330
cib[7439]:   notice: crm_update_peer_state: plugin_handle_membership: Node 
(null)[739512330] - state is now member (was (null))
crmd[7444]:  warning: crmd_cs_dispatch: Receiving messages from a node we think 
is dead: rksaph05[84939948]
crmd[7444]:   notice: get_node_name: Could not obtain a node name for classic 
openais (with plugin) nodeid 739512330
corosync[7402]:  [MAIN  ] Completed service synchronization, ready to provide 
service.
crmd[7444]:   notice: do_state_transition: State transition S_PENDING -> 
S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond 
]

An attempt to restart openais did hang with this messages:
attrd[7442]:   notice: attrd_perform_update: Sent update 7: shutdown=1421771193
corosync[7402]:  [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd 
(pid=7444, seq=6) to terminate...
[message repeats]

So I killed crmd (pid 7444)m and openais shut down.
Unfortunately the problem still persists...

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to