Hi, On Tue, Jan 18, 2011 at 05:11:13PM +0200, Pavlos Polianidis wrote: > Dear Andrew > > So is there any solution to make the quorum operate?
Quorum is replaced by stonith in 2-node clusters. Other than that, there seems to be a problem with the number of expected votes: > >>> Last updated: Thu Jan 13 16:00:15 2011 > >>> Stack: Heartbeat > >>> Current DC: lsc-node02.velti.net (a7e25657-fb85-4cf1-9d9b-5a21484e1583) - > >>> partition WITHOUT quorum > >>> Version: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 > >>> 2 Nodes configured, unknown expected votes > >>> 0 Resources configured. > >>> ============ I don't know why is that, but I think that recently somebody else reported it too with heartbeat. Not sure though. Thanks, Dejan > Thanks in advance > > Pavlos Polianidis > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Andrew Beekhof > Sent: Tuesday, January 18, 2011 4:19 PM > To: General Linux-HA mailing list > Subject: Re: [Linux-HA] no quorum problem > > On Tue, Jan 18, 2011 at 1:27 PM, Pavlos Polianidis > <[email protected]> wrote: > > Dear Andrew, > > > > I am not sure if I should say quorum daemon, I just mean HA heartbeat > > quorum. > > Well thats the CCM which does appear to have started > > > > > > > -----Original Message----- > > From: [email protected] > > [mailto:[email protected]] On Behalf Of Andrew Beekhof > > Sent: Tuesday, January 18, 2011 2:03 PM > > To: General Linux-HA mailing list > > Subject: Re: [Linux-HA] no quorum problem > > > > On Tue, Jan 18, 2011 at 1:00 PM, Pavlos Polianidis > > <[email protected]> wrote: > >> Thank you Andrew, > >> > >> Since what I am going to do is manage resources on 2 nodes, one will be > >> the failover of the other but there will be services running on both nodes > >> and switching nodes if needed, I cannot use STONITH of suicide methods. > >> The nodes will be connected to each other by LAN and also by a parallel > >> cable to prevent some of connection losses. > >> It seems that for some reason the Heartbeat quorum daemon fails to start > > > > wait a second... "quorum daemon" ? > > > >> and if I change the xml configuration manually to have-quorum="1", after a > >> few seconds it returns to have-quorum="0" :). > >> > >> What I found in logs regarding ccm is the below: > >> > >> Jan 17 14:12:12 lsc-node02.velti.net cib: [12449]: info: > >> crm_cluster_connect: Connecting to Heartbeat > >> Jan 17 14:12:12 lsc-node02.velti.net cib: [12449]: info: ccm_connect: > >> Registering with CCM... > >> Jan 17 14:12:12 lsc-node02.velti.net cib: [12449]: WARN: ccm_connect: CCM > >> Activation failed > >> Jan 17 14:12:12 lsc-node02.velti.net cib: [12449]: WARN: ccm_connect: CCM > >> Connection failed 1 times (30 max) > >> Jan 17 14:12:12 lsc-node02.velti.net heartbeat: [12414]: info: the send > >> queue length from heartbeat to client cib is set to 1024 > >> Jan 17 14:12:12 lsc-node02.velti.net crmd: [12453]: info: do_cib_control: > >> Could not connect to the CIB service: connection failed > >> Jan 17 14:12:12 lsc-node02.velti.net crmd: [12453]: WARN: do_cib_control: > >> Couldn't complete CIB registration 1 times... pause and retry > >> Jan 17 14:12:12 lsc-node02.velti.net crmd: [12453]: info: crmd_init: > >> Starting crmd's mainloop > >> Jan 17 14:12:14 lsc-node02.velti.net crmd: [12453]: info: > >> crm_timer_popped: Wait Timer (I_NULL) just popped! > >> Jan 17 14:12:15 lsc-node02.velti.net cib: [12449]: info: ccm_connect: > >> Registering with CCM... > >> Jan 17 14:12:15 lsc-node02.velti.net cib: [12449]: WARN: ccm_connect: CCM > >> Activation failed > >> Jan 17 14:12:15 lsc-node02.velti.net cib: [12449]: WARN: ccm_connect: CCM > >> Connection failed 2 times (30 max) > >> Jan 17 14:12:15 lsc-node02.velti.net crmd: [12453]: info: do_cib_control: > >> Could not connect to the CIB service: connection failed > >> Jan 17 14:12:15 lsc-node02.velti.net crmd: [12453]: WARN: do_cib_control: > >> Couldn't complete CIB registration 2 times... pause and retry > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_ha_callback: flush message from lsc-node01.velti.net > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> find_hash_entry: Creating hash entry for probe_complete > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_perform_update: Delaying operation probe_complete=<null>: cib not > >> connected > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_ha_callback: flush message from lsc-node01.velti.net > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> find_hash_entry: Creating hash entry for last-failure-VIP > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_perform_update: Delaying operation last-failure-VIP=<null>: cib not > >> connected > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_ha_callback: flush message from lsc-node01.velti.net > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> find_hash_entry: Creating hash entry for terminate > >> Jan 17 14:12:15 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_perform_update: Delaying operation terminate=<null>: cib not > >> connected > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_ha_callback: flush message from lsc-node01.velti.net > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> find_hash_entry: Creating hash entry for shutdown > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_perform_update: Delaying operation shutdown=<null>: cib not connected > >> Jan 17 14:12:16 lsc-node02.velti.net ccm: [12448]: info: > >> G_main_add_SignalHandler: Added signal handler for signal 15 > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_ha_callback: flush message from lsc-node01.velti.net > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> find_hash_entry: Creating hash entry for fail-count-VIP > >> Jan 17 14:12:16 lsc-node02.velti.net attrd: [12452]: info: > >> attrd_perform_update: Delaying operation fail-count-VIP=<null>: cib not > >> connected > >> Jan 17 14:12:17 lsc-node02.velti.net crmd: [12453]: info: > >> crm_timer_popped: Wait Timer (I_NULL) just popped! > >> Jan 17 14:12:18 lsc-node02.velti.net cib: [12449]: info: ccm_connect: > >> Registering with CCM... > >> > >> > >> > >> > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> mem_handle_event: Got an event OC_EV_MS_INVALID from ccm > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: mem_handle_event: > >> Got an event OC_EV_MS_INVALID from ccm > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> mem_handle_event: instance=4, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, > >> old_idx=6 Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: > >> mem_handle_event: instance=4, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, > >> old_idx=6 > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> crmd_ccm_msg_callback: Quorum lost after event=INVALID (id=4) > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: > >> cib_ccm_msg_callback: Processing CCM event=INVALID (id=4) > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> ccm_event_detail: INVALID: trans=4, nodes=2, new=2, lost=0 n_idx=0, > >> new_idx=0, old_idx=6 > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: crm_get_peer: > >> Node lsc-node01.velti.net now has id: 1 > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> ccm_event_detail: CURRENT: lsc-node01.velti.net [nodeid=1, born=2] > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: crm_update_peer: > >> Node lsc-node01.velti.net: id=1 state=member (new) addr=(null) votes=-1 > >> born=2 seen=4 proc=0000000 0000000000000000000000100 > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> ccm_event_detail: CURRENT: lsc-node02.velti.net [nodeid=3, born=4] > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: > >> crm_update_peer_proc: lsc-node01.velti.net.ais is now online > >> Jan 17 14:12:22 lsc-node02.velti.net crmd: [12453]: info: > >> ccm_event_detail: NEW: lsc-node01.velti.net [nodeid=1, born=2] > >> Jan 17 14:12:22 lsc-node02.velti.net cib: [12449]: info: > >> crm_update_peer_proc: lsc-node01.velti.net.crmd is now online > >> Jan 17 14:12:23 lsc-node02.velti.net crmd: [12453]: info: > >> ccm_event_detail: NEW: lsc-node02.velti.net [nodeid=3, born=4] > >> Jan 17 14:12:23 lsc-node02.velti.net cib: [12449]: info: crm_get_peer: > >> Node lsc-node02.velti.net now has id: 3 > >> Jan 17 14:12:23 lsc-node02.velti.net crmd: [12453]: info: crm_get_peer: > >> Node lsc-node01.velti.net now has id: 1 > >> Jan 17 14:12:23 lsc-node02.velti.net cib: [12449]: info: crm_update_peer: > >> Node lsc-node02.velti.net: id=3 state=member (new) addr=(null) > >> > >> > >> > >> Jan 17 16:15:51 lsc-node02.velti.net cib: [16237]: info: > >> crm_update_peer_proc: lsc-node01.velti.net.crmd is now online > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: info: > >> ccm_event_detail: NEW: lsc-node01.velti.net [nodeid=1, born=2] > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: info: crm_get_peer: > >> Node lsc-node01.velti.net now has id: 1 > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: recv msg > >> CCM_TYPE_MEM_LIST from lsc-node02.velti.net, status:[null ptr] > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: info: crm_update_peer: > >> Node lsc-node01.velti.net: id=1 state=member (new) addr=(null) votes=-1 > >> born=2 seen=2 proc=00000000000000000000000000000200 > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: WARN: ccm_state_joined: > >> received message with unknown cookie, just dropping > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: info: > >> crm_update_peer_proc: lsc-node01.velti.net.ais is now online > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: dump current > >> membership 0xf7ed5008 > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: debug: > >> post_cache_update: Updated cache after membership event 2. > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> leader=lsc-node02.velti.net > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: debug: > >> post_cache_update: post_cache_update added action A_ELECTION_CHECK to the > >> FSA > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> transition=2 > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: debug: do_fsa_action: > >> actions:trace: // A_ELECTION_CHECK > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> status=CCM_STATE_JOINED > >> Jan 17 16:15:51 lsc-node02.velti.net crmd: [16241]: debug: > >> do_election_check: Ignore election check: we not in an election > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> has_quorum=0 > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> nodename=lsc-node02.velti.net bornon=1 > >> Jan 17 16:15:51 lsc-node02.velti.net ccm: [16236]: debug: > >> nodename=lsc-node01.velti.net bornon=2 > >> Jan 17 16:16:47 lsc-node02.velti.net crmd: [16241]: info: > >> crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped! > >> Jan 17 16:16:47 lsc-node02.velti.net crmd: [16241]: debug: s_crmd_fsa: > >> Processing I_DC_TIMEOUT: [ state=S_PENDING cause=C_TIMER_POPPED > >> origin=crm_timer_popped ] > >> > >> > >> an 17 13:59:34 lsc-node02.velti.net pengine: [23022]: debug: > >> unpack_config: Cluster is symmetric - resources can run anywhere by default > >> Jan 17 13:59:34 lsc-node02.velti.net pengine: [23022]: debug: > >> unpack_config: Default stickiness: 0 > >> Jan 17 13:59:34 lsc-node02.velti.net pengine: [23022]: notice: > >> unpack_config: On loss of CCM Quorum: Ignore > >> Jan 17 13:59:34 lsc-node02.velti.net pengine: [23022]: info: > >> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > >> Jan 17 13:59:34 lsc-node02.velti.net pengine: [23022]: info: > >> determine_online_status: Node lsc-node02.velti.net is online > >> > >> > >> Jan 17 14:10:52 lsc-node02.velti.net lrmd: [23007]: debug: on_receive_cmd: > >> the IPC to client [pid:23010] disconnected. > >> Jan 17 14:10:52 lsc-node02.velti.net crmd: [23010]: info: do_lrm_control: > >> Disconnected from the LRM > >> Jan 17 14:10:52 lsc-node02.velti.net lrmd: [23007]: debug: > >> unregister_client: client crmd [pid:23010] is unregistered > >> Jan 17 14:10:52 lsc-node02.velti.net crmd: [23010]: debug: do_fsa_action: > >> actions:trace: // A_CCM_DISCONNECT > >> Jan 17 14:10:52 lsc-node02.velti.net crmd: [23010]: debug: do_fsa_action: > >> actions:trace: // A_HA_DISCONNECT > >> Jan 17 14:10:52 lsc-node02.velti.net ccm: [23005]: info: client > >> (pid=23010) removed from ccm > >> Jan 17 14:10:52 lsc-node02.velti.net heartbeat: [22969]: debug: Signing > >> client 23010 off > >> Jan 17 14:10:52 lsc-node02.velti.net crmd: [23010]: info: do_ha_control: > >> Disconnected from Heartbeat > >> Jan 17 14:10:52 lsc-node02.velti.net heartbeat: [22969]: debug: > >> G_remove_client(pid=23010, reason='signoff' gsource=0x8836a00) { > >> > >> > >> I would attach the full logs but they are too large :) > >> > >> > >> Thanks in advance > >> > >> Pavlos Polianidis > >> > >> > >> Pavlos Polianidis | Technical Support Specialist > >> > >> Velti > >> 44 Kifisias Ave. > >> 15125 Marousi, Athens, Greece > >> T +30.210.637.8000 > >> F +30.210.637.8888 > >> M +30.695.506.0133 > >> E [email protected] > >> www.velti.com > >> > >> Velti is a global leader in mobile marketing and advertising solutions for > >> mobile operators, ad agencies, brands and media groups. > >> San Francisco | New York | Boston | Dublin | London | Paris | Madrid | > >> Athens | Sofia | Moscow | Dubai | New Delhi | Mumbai | Jakarta | Beijing | > >> Shanghai | Sydney-----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf Of Andrew Beekhof > >> Sent: Tuesday, January 18, 2011 11:02 AM > >> To: General Linux-HA mailing list > >> Subject: Re: [Linux-HA] no quorum problem > >> > >> On Thu, Jan 13, 2011 at 3:17 PM, Pavlos Polianidis > >> <[email protected]> wrote: > >>> Hello, > >>> > >>> > >>> Currently I have installed heartbeat 3.0.2-2.el5 x86_64 and pacemaker > >>> 1.0.7-4.el5 x86_64 on a CentOS release 5.3 x86_64 machine using yum > >>> repositories. > >>> > >>> My configuration is the below: > >>> Ha.cf > >>> > >>> debugfile /var/log/ha-debug > >>> logfile /var/log/ha-log > >>> logfacility local0 > >>> compression_threshold 2 > >>> node lsc-node01 > >>> node lsc-node02 > >>> debug 1 > >>> use_logd false > >>> logfacility daemon > >>> traditional_compression off > >>> compression bz2 > >>> coredumps true > >>> udpport 694 > >>> bcast eth0 > >>> autojoin any > >>> keepalive 1 > >>> warntime 10 > >>> deadtime 35 > >>> initdead 40 > >>> max_rexmit_delay 10000 > >>> crm respawn > >>> > >>> but the output of the crm_mon command is the below: > >>> > >>> Last updated: Thu Jan 13 16:00:15 2011 > >>> Stack: Heartbeat > >>> Current DC: lsc-node02.velti.net (a7e25657-fb85-4cf1-9d9b-5a21484e1583) - > >>> partition WITHOUT quorum > >>> Version: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 > >>> 2 Nodes configured, unknown expected votes > >>> 0 Resources configured. > >>> ============ > >>> > >>> Online: [ lsc-node02.velti.net lsc-node01.velti.net ] > >>> > >>> > >>> Previously I have experimented with the latest version of heartbeat and > >>> pacemaker and I downgraded to the current versions as I had the same > >>> problem and I have read in the forums that it might be some bugs in some > >>> versions. > >>> > >>> in the debug log I see the below entry: > >>> > >>> WARN: cluster_status: We do not have quorum - fencing and resource > >>> management disabled > >>> > >>> In the log: > >>> > >>> Jan 13 15:53:34 lsc-node02.velti.net crmd: [30853]: info: > >>> populate_cib_nodes_ha: Requesting the list of configured nodes > >>> Jan 13 15:53:37 lsc-node02.velti.net crmd: [30853]: WARN: get_uuid: Could > >>> not calculate UUID for lsc-node02 > >>> Jan 13 15:53:37 lsc-node02.velti.net crmd: [30853]: WARN: > >>> populate_cib_nodes_ha: Node lsc-node02: no uuid found > >>> Jan 13 15:53:38 lsc-node02.velti.net crmd: [30853]: WARN: get_uuid: Could > >>> not calculate UUID for lsc-node01 > >>> Jan 13 15:53:38 lsc-node02.velti.net crmd: [30853]: WARN: > >>> populate_cib_nodes_ha: Node lsc-node01: no uuid found > >>> Jan 13 15:53:38 lsc-node02.velti.net crmd: [30853]: info: > >>> do_state_transition: All 1 cluster nodes are eligible to run resources. > >>> Jan 13 15:53:38 lsc-node02.velti.net crmd: [30853]: info: > >>> do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date > >>> Jan 13 15:53:38 lsc-node02.velti.net crmd: [30853]: info: > >>> crm_update_quorum: Updating quorum status to false (call=22) > >>> Jan 13 15:53:38 lsc-node02.velti.net attrd: [30852]: info: > >>> attrd_local_callback: Sending full refresh (origin=crmd) > >>> Jan 13 15:53:38 lsc-node02.velti.net cib: [30849]: info: > >>> cib_process_request: Operation complete: op cib_modify for section nodes > >>> (origin=local/crmd/20, version=0.18.1): ok > >>> (rc=0) > >>> > >>> I did not have the same issue when tried on Centos 5.3 i386. > >>> > >>> Can anyone advise? > >>> > >>> What may be my consequences if no-quorum-policy is set to ignore? > >> > >> Well you;ll be in trouble if you get a split-brain - but no more so > >> than usual since heartbeat will normally always claim it has quorum in > >> a two node cluster. > >> > >> What do the heartbeat/ccm logs say? > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
