Hi

so after all I wiped out the SLES11 installation and installed openSUSE11.1
on my test machines.

Currently there are installed the following packes:
- cluster-glue    1.0-12.1
- heartbeat       3.0.0-33.2
- libdlm          2.99.08-31.1
- libdlm2         2.99.08-31.1
- libglue1        1.0-12.1
- libopenais2     0.80.5-15.1
- libpacemaker3   1.0.5-20.1
- lvm2            2.02.39-43.8
- openais         0.80.5-15.1
- pacemaker       1.0.5-20.1
- pacemaker-pygui 1.99.2-5.2
- resource-agents 1.0-31.4

After creating /etc/ha.d/authkeys and /etc/ha.d/ha.cf I tried to start
heartbeat. The only difference between the configuration files is the IP on
the the entry "". But strange things happen! On one machine heartbeat
starts and is running as expected. With crm_mon I can see the current
state. But on the other machine some minutes nothing happens but then the
machine is rebootet. Here is the relevant part of the logfile:

Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Version 2 support:
yes
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/ccm
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/cib
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
root /usr/lib/heartbeat/lrmd -r
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
root /usr/lib/heartbeat/stonithd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/attrd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/crmd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: AUTH: i=1: key =
0x8113f48, auth=0xb7f11050, authname=sha1
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Core dumps could
be lost if multiple dumps occur.
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info:
**************************
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Configuration
validated. Starting heartbeat 2.99.4
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Heartbeat Hg
Version: node: b37cbb1b036c742f0950977495faca78e68aa53d
Sep 17 13:17:38 opensuse-master heartbeat: [4156]: info: heartbeat: version
2.99.4
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Heartbeat
generation: 1253182976
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: write
socket priority set to IPTOS_LOWDELAY on eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: bound
send socket to device: eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: bound
receive socket to device: eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast:
started on port 694 interface eth1 to 172.17.26.152
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Local status now
set to: 'up'
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Managed
write_hostcachedata process 4161 exited with return code 0.
Sep 17 13:17:40 opensuse-master heartbeat: [4158]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:40 opensuse-master heartbeat: [4159]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:40 opensuse-master heartbeat: [4160]: info: Stack hogger
failed 0xffffffff
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: WARN: node
opensuse-redundanz: is dead
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Comm_now_up():
updating status to active
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Local status now
set to: 'active'
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/ccm" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/cib" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4217]: info: Starting
"/usr/lib/heartbeat/ccm" as uid 90  gid 90 (pid 4217)
Sep 17 13:19:39 opensuse-master heartbeat: [4218]: info: Starting
"/usr/lib/heartbeat/cib" as uid 90  gid 90 (pid 4218)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/lrmd -r" (0,0)
Sep 17 13:19:39 opensuse-master heartbeat: [4219]: info: Starting
"/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 4219)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/stonithd" (0,0)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/attrd" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4220]: info: Starting
"/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 4220)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/crmd" (90,90)
Sep 17 13:19:40 opensuse-master heartbeat: [4221]: info: Starting
"/usr/lib/heartbeat/attrd" as uid 90  gid 90 (pid 4221)
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
Invoked: /usr/lib/heartbeat/cib
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:19:40 opensuse-master heartbeat: [4222]: info: Starting
"/usr/lib/heartbeat/crmd" as uid 90  gid 90 (pid 4222)
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master ccm: [4217]: info: Hostname:
opensuse-master
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Sep 17 13:19:40 opensuse-master attrd: [4221]: info:
Invoked: /usr/lib/heartbeat/attrd
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Starting up
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: crm_cluster_connect:
Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master attrd: [4221]: ERROR: main: HA Signon
failed
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Cluster
connection active
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Accepting
attribute updates
Sep 17 13:19:40 opensuse-master attrd: [4221]: ERROR: main: Aborting
startup
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/attrd process 4221 exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/attrd exited with return code 100.
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Core dumps could be
lost if multiple dumps occur.
Sep 17 13:19:40 opensuse-master cib: [4218]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Sep 17 13:19:40 opensuse-master crmd: [4222]: info:
Invoked: /usr/lib/heartbeat/crmd
Sep 17 13:19:40 opensuse-master crmd: [4222]: info: main: CRM Hg Version:
13f3497959e894e57b8cb24f59c8683346b216e3

Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 180 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: retrieveCib: Cluster
configuration not found: /var/lib/heartbeat/crm/cib.xml
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: readCibXmlFile: Primary
configuration corrupt or unusable, trying backup...
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Core dumps could be
lost if multiple dumps occur.
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info: Stack hogger failed
0xffffffff
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: readCibXmlFile:
Continuing with an empty configuration.
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 120 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
crm_cluster_connect: Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:19:40 opensuse-master stonithd: [4220]: ERROR: failed to connect
to cluster
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Sep 17 13:19:40 opensuse-master stonithd: [4220]:
ERROR: /usr/lib/heartbeat/stonithd abnormally abort.
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info: Started.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 260 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_SIG_dispatch:
Dispatch function for SIGCHLD was delayed 190 ms (> 100 ms) before being
called (GSource: 0x81159b0)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: info: G_SIG_dispatch:
started at 1718188528 should have started at 1718188509
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/stonithd process 4220 exited with return code
100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/stonithd exited with return code 100.
Sep 17 13:19:40 opensuse-master crmd: [4222]: info: crmd_init: Starting
crmd
Sep 17 13:19:40 opensuse-master crmd: [4222]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master cib: [4218]: info: startCib: CIB
Initialization completed successfully
Sep 17 13:19:40 opensuse-master cib: [4218]: info: crm_cluster_connect:
Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master cib: [4218]: CRIT: cib_init: Cannot sign in
to the cluster... terminating
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/cib process 4218 exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/cib exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: EMERG: Rebooting system.
Reason: /usr/lib/heartbeat/cib
Sep 17 13:19:40 opensuse-master ccm: [4217]: info: Break tie for 2 nodes
cluster
Sep 17 13:19:41 opensuse-master ccm: [4217]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Sep 17 13:19:41 opensuse-master crmd: [4222]: info: do_cib_control: Could
not connect to the CIB service: connection failed
Sep 17 13:19:41 opensuse-master crmd: [4222]: WARN: do_cib_control:
Couldn't complete CIB registration 1 times... pause and retry
Sep 17 13:19:41 opensuse-master crmd: [4222]: info: crmd_init: Starting
crmd's mainloop

(If it is useful for you I can provide the debug output too, but it is a
little bit longer than the normal output... ;-)

Actually I don't know whats wrong here. There are two machines running
openSUSE11.1 with the same heartbeat components installed. One of them
runs, the other one dies during startup of heartbeat. Any ideas what to do
next?

Regards,

Yves Schumann
Softwareentwicklungsingenieur Security Solutions Division
IT-Koordinator
______________________________
Ascom (Schweiz) AG
http://www.ascom.com

"Walking on water and developing software from a specification are easy if
both are frozen" -- Edward V. Berard

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to