[Linux-HA] Lenny Nodes offline

Thomas Halinka Tue, 14 Oct 2008 16:43:48 -0700

Hi together,

i would like to setup a 2-node-heartbeat-Cluster with debian lenny, but
the nodes are always offline.


I installed hb2 from packages with

  # apt-get install heartbeat-2 heartbeat-2-gui

created minimal ha.cf

use_logd yes
bcast bond1
node node1 node2
crm yes

and authkeys. When i start heartbeat the nodes see only himselves and
not each other

============
Last updated: Wed Oct 15 00:59:47 2008
Current DC: kzvxen2 (aa7f65f8-3aa6-4db4-a262-f52e798038d9)
1 Nodes configured.
0 Resources configured.
============

Node: node2 (aa7f65f8-3aa6-4db4-a262-f52e798038d9): online

the same on node1. 

cibadmin -Q
 <cib generated="true" admin_epoch="0" epoch="3" num_updates="7"
have_quorum="true" ignore_dtd="false" num_peers="1"
cib_feature_revision="2.0" cib-last-written="Tue Oct 14 20:18:10 2008"
ccm_transition="1" dc_uuid="aa7f65f8-3aa6-4db4-a262-f52e798038d9">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-dc-version"
name="dc-version" value="2.1.3-node:
552305612591183b1628baa5bc6e903e0f1e26a3"/>
         </attributes>
       </cluster_property_set>
       <cluster_property_set id="bootstrap">
         <attributes>
           <nvpair id="bootstrap01" name="transition-idle-timeout"
value="60"/>
           <nvpair id="bootstrap02" name="default-resource-stickiness"
value="INFINITY"/>
           <nvpair id="bootstrap03"
name="default-resource-failure-stickiness" value="-500"/>
           <nvpair id="bootstrap04" name="stonith-enabled"
value="true"/>
           <nvpair id="bootstrap05" name="stonith-action"
value="reboot"/>
           <nvpair id="bootstrap06" name="symmetric-cluster"
value="true"/>
           <nvpair id="bootstrap07" name="no-quorum-policy"
value="stop"/>
           <nvpair id="bootstrap08" name="stop-orphan-resources"
value="true"/>
           <nvpair id="bootstrap09" name="stop-orphan-actions"
value="true"/>
           <nvpair id="bootstrap10" name="is-managed-default"
value="true"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="aa7f65f8-3aa6-4db4-a262-f52e798038d9" uname="node2"
type="normal"/>
     </nodes>
     <resources/>
     <constraints/>
   </configuration>
   <status>
     <node_state id="aa7f65f8-3aa6-4db4-a262-f52e798038d9" uname="node2"
crmd="online" crm-debug-origin="do_lrm_query" shutdown="0" in_ccm="true"
ha="active" join="member" expected="member">
       <lrm id="aa7f65f8-3aa6-4db4-a262-f52e798038d9">
         <lrm_resources/>
       </lrm>
       <transient_attributes id="aa7f65f8-3aa6-4db4-a262-f52e798038d9">
         <instance_attributes
id="status-aa7f65f8-3aa6-4db4-a262-f52e798038d9">
           <attributes>
             <nvpair
id="status-aa7f65f8-3aa6-4db4-a262-f52e798038d9-probe_complete"
name="probe_complete" value="true"/>
           </attributes>
         </instance_attributes>
       </transient_attributes>
     </node_state>
   </status>
 </cib>

I think the Problem is the quorum which is set to true. So i  stoppped
hb2 removed existing cib, but the nodes still dont see each other. Next
I tried 

  # cibadmin -Q > my.xml && sed -i -e
's/have_quorum="true"/have_quorum="false"/' my.xml && cibadmin -U -x
my.xml but still offline.

Anyway communication seems to work 

# /usr/share/heartbeat/TestHeartbeatComm fix-communication
DeleteTestFileOK
Reloading High-Availability services: 
heartbeat[4375]: 2008/10/15_01:08:17 info: Enabling logging daemon 
heartbeat[4375]: 2008/10/15_01:08:17 info: logfile and debug file are
those specified in logd config file (default /etc/logd.cf)
heartbeat[4375]: 2008/10/15_01:08:17 info: Version 2 support: yes
heartbeat[4375]: 2008/10/15_01:08:17 WARN: Core dumps could be lost if
multiple dumps occur.
heartbeat[4375]: 2008/10/15_01:08:17 WARN: Consider setting non-default
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
supportability
heartbeat[4375]: 2008/10/15_01:08:17 WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
heartbeat[4375]: 2008/10/15_01:08:17 info: Signalling heartbeat pid 4322
to reread config files
Done.


What am i doing wrong?


Thanx in advance

Thomas

Heartbeat-Version
# dpkg -l | grep heartbeat
ii  heartbeat                         2.1.3-6
Subsystem for High-Availability Linux
ii  heartbeat-2                       2.1.3-6
Subsystem for High-Availability Linux
ii  heartbeat-2-gui                   2.1.3-6
Provides a gui interface to manage heartbeat
ii  heartbeat-gui                     2.1.3-6
Provides a gui interface to manage heartbeat

#### some log
Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml.last
(digest: /var/lib/heartbeat/crm/cib.xml.sig.last)
Oct 15 01:14:41 node1 tengine: [5452]: info: te_init: Registering TE
UUID: a00bc464-c2a2-438d-b634-35d235dec802
Oct 15 01:14:41 node1 cib: [5442]: info: cib_null_callback: Setting
cib_diff_notify callbacks for tengine: on
Oct 15 01:14:41 node1 tengine: [5452]: info: set_graph_functions:
Setting custom graph functions
Oct 15 01:14:41 node1 tengine: [5452]: info: unpack_graph: Unpacked
transition -1: 0 actions in 0 synapses
Oct 15 01:14:41 node1 tengine: [5452]: info: te_init: Starting tengine
Oct 15 01:14:41 node1 tengine: [5452]: info: te_connect_stonith:
Attempting connection to fencing daemon...
Oct 15 01:14:41 node1 cib: [5454]: info: write_cib_contents: Wrote
version 0.1.1 of the CIB to disk (digest:
1fe4ea76bbea0a808dcece37b7be7aec)
Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Oct 15 01:14:41 node1 cib: [5454]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml.last
(digest: /var/lib/heartbeat/crm/cib.xml.sig.last)
Oct 15 01:14:41 node1 crmd: [5446]: info: update_dc: Set DC to node1
(2.0)
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Oct 15 01:14:42 node1 cib: [5442]: info: sync_our_cib: Syncing CIB to
all peers
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1
cluster nodes responded to the join offer.
Oct 15 01:14:42 node1 attrd: [5445]: info: attrd_local_callback: Sending
full refresh
Oct 15 01:14:42 node1 crmd: [5446]: info: update_attrd: Connecting to
attrd...
Oct 15 01:14:42 node1 tengine: [5452]: info: te_connect_stonith:
Connected
Oct 15 01:14:42 node1 crmd: [5446]: info: update_dc: Set DC to node1
(2.0)
Oct 15 01:14:42 node1 crmd: [5446]: info: do_dc_join_ack: join-1:
Updating node state to member for node1
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority:
Abort priority upgraded to 1000000
Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority: 'DC
Takeover' abort superceeded
Oct 15 01:14:42 node1 pengine: [5453]: info: determine_online_status:
Node node1 is online
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Oct 15 01:14:42 node1 tengine: [5452]: info: unpack_graph: Unpacked
transition 0: 1 actions in 1 synapses
Oct 15 01:14:42 node1 tengine: [5452]: info: send_rsc_command:
Initiating action 2: probe_complete on node1
Oct 15 01:14:42 node1 tengine: [5452]: info: run_graph: Transition 0:
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Oct 15 01:14:42 node1 tengine: [5452]: info: notify_crmd: Transition 0
status: te_complete - <null>
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Oct 15 01:14:42 node1 tengine: [5452]: info: extract_event: Aborting on
transient_attributes changes for ad0cd857-d765-44f2-a17d-0b689b63774b
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_IPC_MESSAGE origin=route_message ]
Oct 15 01:14:42 node1 tengine: [5452]: info: update_abort_priority:
Abort priority upgraded to 1000000
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Oct 15 01:14:42 node1 pengine: [5453]: info: process_pe_message:
Transition 0: PEngine Input stored
in: /var/lib/heartbeat/pengine/pe-input-44.bz2
Oct 15 01:14:42 node1 pengine: [5453]: info: determine_online_status:
Node node1 is online
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Oct 15 01:14:42 node1 tengine: [5452]: info: unpack_graph: Unpacked
transition 1: 0 actions in 0 synapses
Oct 15 01:14:42 node1 tengine: [5452]: info: run_graph: Transition 1:
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Oct 15 01:14:42 node1 crmd: [5446]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Oct 15 01:14:42 node1 tengine: [5452]: info: notify_crmd: Transition 1
status: te_complete - <null>
Oct 15 01:14:42 node1 pengine: [5453]: info: process_pe_message:
Transition 1: PEngine Input stored
in: /var/lib/heartbeat/pengine/pe-input-45.bz2



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Lenny Nodes offline

Reply via email to