It seems to be an issue with the Corosync API. Here is the output from corosync.log:
Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: startCib: CIB Initialization completed successfully Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: get_cluster_type: Cluster type is: 'corosync' Aug 08 17:18:45 [30364] KNTCLFS001 cib: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: init_ais_connection_once: Connection to 'corosync': established Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: crm_new_peer: Node KNTCLFS001 now has id: 83994816 Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: crm_new_peer: Node 83994816 is now known as KNTCLFS001 Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: cib_init: Starting cib mainloop Aug 08 17:18:45 [30364] KNTCLFS001 cib: info: set_crm_log_level: New log level: 3 0 Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: do_cib_control: CIB connection established Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: get_cluster_type: Cluster type is: 'corosync' Aug 08 17:18:46 [30369] KNTCLFS001 crmd: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng: notice: setup_cib: Watching for stonith topology changes Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng: info: main: Starting stonith-ng mainloop Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: init_ais_connection_once: Connection to 'corosync': established Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: crm_new_peer: Node KNTCLFS001 now has id: 83994816 Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: crm_new_peer: Node 83994816 is now known as KNTCLFS001 Aug 08 17:18:46 [30369] KNTCLFS001 crmd: info: ais_status_callback: status: KNTCLFS001 is now unknown Aug 08 17:18:46 [30369] KNTCLFS001 crmd: error: init_quorum_connection: The Corosync quorum API is not supported in this build Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: error: pcmk_child_exit: Child process crmd exited (pid=30369, rc=100) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: warning: pcmk_child_exit: Pacemaker child process crmd no longer wishes to be respawned. Shutting ourselves down. Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: warning: send_ipc_message: IPC Channel to 30369 is not connected Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: pcmk_shutdown_worker: Shuting down Pacemaker Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: stop_child: Stopping pengine: Sent -15 to process 30368 Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: info: pcmk_child_exit: Child process pengine exited (pid=30368, rc=0) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: stop_child: Stopping attrd: Sent -15 to process 30367 Aug 08 17:18:46 [30367] KNTCLFS001 attrd: notice: main: Exiting... Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: info: pcmk_child_exit: Child process attrd exited (pid=30367, rc=0) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: warning: send_ipc_message: IPC Channel to 30367 is not connected Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: stop_child: Stopping lrmd: Sent -15 to process 30366 Aug 08 17:18:46 KNTCLFS001 lrmd: [30366]: info: lrmd is shutting down Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: info: pcmk_child_exit: Child process lrmd exited (pid=30366, rc=0) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: stop_child: Stopping stonith-ng: Sent -15 to process 30365 Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng: info: stonith_shutdown: Terminating with 0 clients Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: info: pcmk_child_exit: Child process stonith-ng exited (pid=30365, rc=0) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: stop_child: Stopping cib: Sent -15 to process 30364 Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: cib_shutdown: Disconnected 0 clients Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: cib_process_disconnect: All clients disconnected... Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: cib_ha_connection_destroy: Heartbeat disconnection complete... exiting Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: cib_ha_connection_destroy: Exiting... Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: crm_xml_cleanup: Cleaning up memory from libxml2 Aug 08 17:18:46 [30364] KNTCLFS001 cib: info: main: Done Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: info: pcmk_child_exit: Child process cib exited (pid=30364, rc=0) Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd: notice: pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Yount, William D Sent: Wednesday, August 08, 2012 2:52 AM To: [email protected] Subject: [Linux-HA] Can't start pacemaker I am following with the "Clusters from Scratch" guide to setup a cluster on two CentOS 6.3 boxes. I am at the part where corosync is started and working correctly on both nodes. When I try to start pacemaker on either node, it keeps failing. Here is the output from strace: stat("/etc/init.d/pacemaker", {st_mode=S_IFREG|0755, st_size=2543, ...}) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 stat(".", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 stat("/sbin/env", 0x7fff9f8d6bb0) = -1 ENOENT (No such file or directory) stat("/usr/sbin/env", 0x7fff9f8d6bb0) = -1 ENOENT (No such file or directory) stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 geteuid() = 0 getegid() = 0 getuid() = 0 getgid() = 0 access("/bin/env", X_OK) = 0 stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 geteuid() = 0 getegid() = 0 getuid() = 0 getgid() = 0 access("/bin/env", R_OK) = 0 stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 geteuid() = 0 getegid() = 0 getuid() = 0 getgid() = 0 access("/bin/env", X_OK) = 0 stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 geteuid() = 0 getegid() = 0 getuid() = 0 getgid() = 0 access("/bin/env", R_OK) = 0 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0 rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f48bced69d0) = 3949 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {0x43d060, [], SA_RESTORER, 0x7f48bc53a920}, {SIG_DFL, [], SA_RESTORER, 0x7f48bc53a920}, 8) = 0 wait4(-1, Starting Pacemaker Cluster Manager: [FAILED] [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 3949 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, 0x7fff9f8d671c, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn(0xffffffffffffffff) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f48bc53a920}, {0x43d060, [], SA_RESTORER, 0x7f48bc53a920}, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 read(255, "", 1694) = 0 exit_group(1) = ? I see the "no such file or directory" messages but I am not sure what impact that has on the application. I have been noticing that corosync spikes up to 100% cpu usage; makes the entire system sluggish. Here are software versions: centos-release-6-3.el6.centos.9.x86_64 corosynclib-1.4.1-7.el6.x86_64 corosync-1.4.1-7.el6.x86_64 pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 William _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
