Re: [Linux-HA] Node unable to join cluster

Thomas Thu, 13 Oct 2011 08:23:09 -0700

On 10/13/2011 01:25 PM, Florian Haas wrote:
> On 2011-10-13 12:38, Thomas wrote:
>> Hello,
>>
>> I am using 3-node corosync/pacemaker cluster setup. Repeatedly one of
>> the nodes refuses to join the cluster. Here is a snippet from the log file:
>>
>> Oct 13 12:34:03 sh2 crmd: [2292]: info: crm_timer_popped: Welcomed: 1,
>> Integrated: 0
>> Oct 13 12:34:03 sh2 crmd: [2292]: info: do_state_transition: State
>> transition S_INTEGRATION ->  S_FINALIZE_JOIN [ input=I_INTEGRATED
>> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> Oct 13 12:34:03 sh2 crmd: [2292]: WARN: do_state_transition: Progressed
>> to state S_FINALIZE_JOIN after C_TIMER_POPPED
>> Oct 13 12:34:03 sh2 crmd: [2292]: WARN: do_state_transition: 1 cluster
>> nodes failed to respond to the join offer.
>> Oct 13 12:34:03 sh2 crmd: [2292]: info: ghash_print_node:   Welcome
>> reply not received from: sh2 6
>> Oct 13 12:34:03 sh2 crmd: [2292]: WARN: do_log: FSA: Input I_ELECTION_DC
>> from do_dc_join_finalize() received in state S_FINALIZE_JOIN
>> Oct 13 12:34:03 sh2 crmd: [2292]: info: do_state_transition: State
>> transition S_FINALIZE_JOIN ->  S_INTEGRATION [ input=I_ELECTION_DC
>> cause=C_FSA_INTERNAL origin=do_dc_join_finalize ]
>> Oct 13 12:34:03 sh2 crmd: [2292]: info: do_dc_join_offer_all: join-7:
>> Waiting on 1 outstanding join acks
>>
>> Any idea what I should look after?
>>
>> Networking (both rings) seems to work just fine.
>
> "Seems to"? Have you confirmed with "corosync-cfgtool -s" on all nodes?
>
>> Versions used are:
>> corosync 1.2.1-4&  pacemaker 1.0.9.1+hg15626-1 from current version of
>> debian squeeze.
>
> Please upgrade to the versions in squeeze-backports at the earliest
> convenience.
>
> Cheers,
> Florian
>


Hi Florian,

thanks for the advice. I've been able to sort it out (at least I think 
it was the reason of malfuction) just a minute before you wrote.

The problem was, that I had "127.0.1.1 sh2.domain sh2" in /etc/hosts 
(which was an entry inserted by debian ?!). And I guess that due to 
this, the node itself wasn't able to respond to its own join offer as 
stated in log:
 >> Oct 13 12:34:03 sh2 crmd: [2292]: info: ghash_print_node:   Welcome
 >> reply not received from: sh2 6

Anyway, once I've edited /etc/hosts and restarted corosync, the node 
joined the cluster almost immediately.

One remark to "corosync-cfgtool -s" - I am not sure what does this check 
when I run it. It says that both rings are active with no faults, but 
all info stated is only related to the local node. What does it say 
regarding the rest of the nodes in cluster?

Thanks,
   Thomas

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Node unable to join cluster

Reply via email to