If the communications was working properly, then there would be two nodes listed in the CIB (since its populated from Heartbeat's view of the cluster). But there isn't and therefor its not.
Also, manually setting quorum will have no effect and certainly wont bring the nodes online. In fact its the reverse, only by bringing the nodes online will you get quorum (which is a property of the cluster, not an option). I'd start by setting up the cluster to use a regular interface instead of a bonded one... see if you can get the simple case working before moving on to the advanced stuff. On Wed, Oct 15, 2008 at 01:20, Thomas Halinka <[EMAIL PROTECTED]> wrote: > Hi together, > > i would like to setup a 2-node-heartbeat-Cluster with debian lenny, but > the nodes are always offline. > > I installed hb2 from packages with > > # apt-get install heartbeat-2 heartbeat-2-gui > > created minimal ha.cf > > use_logd yes > bcast bond1 > node node1 node2 > crm yes > > and authkeys. When i start heartbeat the nodes see only himselves and > not each other > > ============ > Last updated: Wed Oct 15 00:59:47 2008 > Current DC: kzvxen2 (aa7f65f8-3aa6-4db4-a262-f52e798038d9) > 1 Nodes configured. > 0 Resources configured. > ============ > > Node: node2 (aa7f65f8-3aa6-4db4-a262-f52e798038d9): online > > the same on node1. > > cibadmin -Q > <cib generated="true" admin_epoch="0" epoch="3" num_updates="7" > have_quorum="true" ignore_dtd="false" num_peers="1" > cib_feature_revision="2.0" cib-last-written="Tue Oct 14 20:18:10 2008" > ccm_transition="1" dc_uuid="aa7f65f8-3aa6-4db4-a262-f52e798038d9"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <attributes> > <nvpair id="cib-bootstrap-options-dc-version" > name="dc-version" value="2.1.3-node: > 552305612591183b1628baa5bc6e903e0f1e26a3"/> > </attributes> > </cluster_property_set> > <cluster_property_set id="bootstrap"> > <attributes> > <nvpair id="bootstrap01" name="transition-idle-timeout" > value="60"/> > <nvpair id="bootstrap02" name="default-resource-stickiness" > value="INFINITY"/> > <nvpair id="bootstrap03" > name="default-resource-failure-stickiness" value="-500"/> > <nvpair id="bootstrap04" name="stonith-enabled" > value="true"/> > <nvpair id="bootstrap05" name="stonith-action" > value="reboot"/> > <nvpair id="bootstrap06" name="symmetric-cluster" > value="true"/> > <nvpair id="bootstrap07" name="no-quorum-policy" > value="stop"/> > <nvpair id="bootstrap08" name="stop-orphan-resources" > value="true"/> > <nvpair id="bootstrap09" name="stop-orphan-actions" > value="true"/> > <nvpair id="bootstrap10" name="is-managed-default" > value="true"/> > </attributes> > </cluster_property_set> > </crm_config> > <nodes> > <node id="aa7f65f8-3aa6-4db4-a262-f52e798038d9" uname="node2" > type="normal"/> > </nodes> > <resources/> > <constraints/> > </configuration> > <status> > <node_state id="aa7f65f8-3aa6-4db4-a262-f52e798038d9" uname="node2" > crmd="online" crm-debug-origin="do_lrm_query" shutdown="0" in_ccm="true" > ha="active" join="member" expected="member"> > <lrm id="aa7f65f8-3aa6-4db4-a262-f52e798038d9"> > <lrm_resources/> > </lrm> > <transient_attributes id="aa7f65f8-3aa6-4db4-a262-f52e798038d9"> > <instance_attributes > id="status-aa7f65f8-3aa6-4db4-a262-f52e798038d9"> > <attributes> > <nvpair > id="status-aa7f65f8-3aa6-4db4-a262-f52e798038d9-probe_complete" > name="probe_complete" value="true"/> > </attributes> > </instance_attributes> > </transient_attributes> > </node_state> > </status> > </cib> > > I think the Problem is the quorum which is set to true. So i stoppped > hb2 removed existing cib, but the nodes still dont see each other. Next > I tried > > # cibadmin -Q > my.xml && sed -i -e > 's/have_quorum="true"/have_quorum="false"/' my.xml && cibadmin -U -x > my.xml but still offline. > > Anyway communication seems to work > > # /usr/share/heartbeat/TestHeartbeatComm fix-communication > DeleteTestFileOK > Reloading High-Availability services: > heartbeat[4375]: 2008/10/15_01:08:17 info: Enabling logging daemon > heartbeat[4375]: 2008/10/15_01:08:17 info: logfile and debug file are > those specified in logd config file (default /etc/logd.cf) > heartbeat[4375]: 2008/10/15_01:08:17 info: Version 2 support: yes > heartbeat[4375]: 2008/10/15_01:08:17 WARN: Core dumps could be lost if > multiple dumps occur. > heartbeat[4375]: 2008/10/15_01:08:17 WARN: Consider setting non-default > value in /proc/sys/kernel/core_pattern (or equivalent) for maximum > supportability > heartbeat[4375]: 2008/10/15_01:08:17 WARN: Consider > setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum > supportability > heartbeat[4375]: 2008/10/15_01:08:17 info: Signalling heartbeat pid 4322 > to reread config files > Done. > > > What am i doing wrong? > > > Thanx in advance > > Thomas > > Heartbeat-Version > # dpkg -l | grep heartbeat > ii heartbeat 2.1.3-6 > Subsystem for High-Availability Linux > ii heartbeat-2 2.1.3-6 > Subsystem for High-Availability Linux > ii heartbeat-2-gui 2.1.3-6 > Provides a gui interface to manage heartbeat > ii heartbeat-gui 2.1.3-6 > Provides a gui interface to manage heartbeat > > #### some log > Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml > (digest: /var/lib/heartbeat/crm/cib.xml.sig) > Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml > (digest: /var/lib/heartbeat/crm/cib.xml.sig) > Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: /var/lib/heartbeat/crm/cib.xml.sig.last) > Oct 15 01:14:41 node1 tengine: [5452]: info: te_init: Registering TE > UUID: a00bc464-c2a2-438d-b634-35d235dec802 > Oct 15 01:14:41 node1 cib: [5442]: info: cib_null_callback: Setting > cib_diff_notify callbacks for tengine: on > Oct 15 01:14:41 node1 tengine: [5452]: info: set_graph_functions: > Setting custom graph functions > Oct 15 01:14:41 node1 tengine: [5452]: info: unpack_graph: Unpacked > transition -1: 0 actions in 0 synapses > Oct 15 01:14:41 node1 tengine: [5452]: info: te_init: Starting tengine > Oct 15 01:14:41 node1 tengine: [5452]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Oct 15 01:14:41 node1 cib: [5454]: info: write_cib_contents: Wrote > version 0.1.1 of the CIB to disk (digest: > 1fe4ea76bbea0a808dcece37b7be7aec) > Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml > (digest: /var/lib/heartbeat/crm/cib.xml.sig) > Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: /var/lib/heartbeat/crm/cib.xml.sig.last) > Oct 15 01:14:41 node1 crmd: [5446]: info: update_dc: Set DC to node1 > (2.0) > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Oct 15 01:14:42 node1 cib: [5442]: info: sync_our_cib: Syncing CIB to > all peers > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1 > cluster nodes responded to the join offer. > Oct 15 01:14:42 node1 attrd: [5445]: info: attrd_local_callback: Sending > full refresh > Oct 15 01:14:42 node1 crmd: [5446]: info: update_attrd: Connecting to > attrd... > Oct 15 01:14:42 node1 tengine: [5452]: info: te_connect_stonith: > Connected > Oct 15 01:14:42 node1 crmd: [5446]: info: update_dc: Set DC to node1 > (2.0) > Oct 15 01:14:42 node1 crmd: [5446]: info: do_dc_join_ack: join-1: > Updating node state to member for node1 > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED > cause=C_FSA_INTERNAL origin=check_join_state ] > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1 > cluster nodes are eligible to run resources. > Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority: > Abort priority upgraded to 1000000 > Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority: 'DC > Takeover' abort superceeded > Oct 15 01:14:42 node1 pengine: [5453]: info: determine_online_status: > Node node1 is online > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > Oct 15 01:14:42 node1 tengine: [5452]: info: unpack_graph: Unpacked > transition 0: 1 actions in 1 synapses > Oct 15 01:14:42 node1 tengine: [5452]: info: send_rsc_command: > Initiating action 2: probe_complete on node1 > Oct 15 01:14:42 node1 tengine: [5452]: info: run_graph: Transition 0: > (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0) > Oct 15 01:14:42 node1 tengine: [5452]: info: notify_crmd: Transition 0 > status: te_complete - <null> > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > Oct 15 01:14:42 node1 tengine: [5452]: info: extract_event: Aborting on > transient_attributes changes for ad0cd857-d765-44f2-a17d-0b689b63774b > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_IPC_MESSAGE origin=route_message ] > Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority: > Abort priority upgraded to 1000000 > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1 > cluster nodes are eligible to run resources. > Oct 15 01:14:42 node1 pengine: [5453]: info: process_pe_message: > Transition 0: PEngine Input stored > in: /var/lib/heartbeat/pengine/pe-input-44.bz2 > Oct 15 01:14:42 node1 pengine: [5453]: info: determine_online_status: > Node node1 is online > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > Oct 15 01:14:42 node1 tengine: [5452]: info: unpack_graph: Unpacked > transition 1: 0 actions in 0 synapses > Oct 15 01:14:42 node1 tengine: [5452]: info: run_graph: Transition 1: > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0) > Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > Oct 15 01:14:42 node1 tengine: [5452]: info: notify_crmd: Transition 1 > status: te_complete - <null> > Oct 15 01:14:42 node1 pengine: [5453]: info: process_pe_message: > Transition 1: PEngine Input stored > in: /var/lib/heartbeat/pengine/pe-input-45.bz2 > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
