Re: [Linux-HA] Can't start pacemaker

Yount, William D Wed, 08 Aug 2012 15:40:50 -0700

It seems to be an issue with the Corosync API. Here is the output from 
corosync.log:


Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: startCib:      CIB 
Initialization completed successfully
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: get_cluster_type:      
Cluster type is: 'corosync'
Aug 08 17:18:45 [30364] KNTCLFS001        cib:   notice: crm_cluster_connect:   
Connecting to cluster infrastructure: corosync
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: 
init_ais_connection_once:      Connection to 'corosync': established
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: crm_new_peer:  Node 
KNTCLFS001 now has id: 83994816
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: crm_new_peer:  Node 
83994816 is now known as KNTCLFS001
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: cib_init:      
Starting cib mainloop
Aug 08 17:18:45 [30364] KNTCLFS001        cib:     info: set_crm_log_level:     
New log level: 3 0
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: do_cib_control:        
CIB connection established
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: get_cluster_type:      
Cluster type is: 'corosync'
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:   notice: crm_cluster_connect:   
Connecting to cluster infrastructure: corosync
Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng:   notice: setup_cib:     
Watching for stonith topology changes
Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng:     info: main:  Starting 
stonith-ng mainloop
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: 
init_ais_connection_once:      Connection to 'corosync': established
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: crm_new_peer:  Node 
KNTCLFS001 now has id: 83994816
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: crm_new_peer:  Node 
83994816 is now known as KNTCLFS001
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:     info: ais_status_callback:   
status: KNTCLFS001 is now unknown
Aug 08 17:18:46 [30369] KNTCLFS001       crmd:    error: 
init_quorum_connection:        The Corosync quorum API is not supported in this 
build
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:    error: pcmk_child_exit:       
Child process crmd exited (pid=30369, rc=100)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:  warning: pcmk_child_exit:       
Pacemaker child process crmd no longer wishes to be respawned. Shutting 
ourselves down.
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:  warning: send_ipc_message:      
IPC Channel to 30369 is not connected
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: pcmk_shutdown_worker:  
Shuting down Pacemaker
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: stop_child:    
Stopping pengine: Sent -15 to process 30368
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:     info: pcmk_child_exit:       
Child process pengine exited (pid=30368, rc=0)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: stop_child:    
Stopping attrd: Sent -15 to process 30367
Aug 08 17:18:46 [30367] KNTCLFS001      attrd:   notice: main:  Exiting...
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:     info: pcmk_child_exit:       
Child process attrd exited (pid=30367, rc=0)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:  warning: send_ipc_message:      
IPC Channel to 30367 is not connected
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: stop_child:    
Stopping lrmd: Sent -15 to process 30366
Aug 08 17:18:46 KNTCLFS001 lrmd: [30366]: info: lrmd is shutting down
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:     info: pcmk_child_exit:       
Child process lrmd exited (pid=30366, rc=0)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: stop_child:    
Stopping stonith-ng: Sent -15 to process 30365
Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng:     info: crm_signal_dispatch:   
Invoking handler for signal 15: Terminated
Aug 08 17:18:46 [30365] KNTCLFS001 stonith-ng:     info: stonith_shutdown:      
Terminating with  0 clients
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:     info: pcmk_child_exit:       
Child process stonith-ng exited (pid=30365, rc=0)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: stop_child:    
Stopping cib: Sent -15 to process 30364
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: crm_signal_dispatch:   
Invoking handler for signal 15: Terminated
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: cib_shutdown:  
Disconnected 0 clients
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: 
cib_process_disconnect:        All clients disconnected...
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: 
cib_ha_connection_destroy:     Heartbeat disconnection complete... exiting
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: 
cib_ha_connection_destroy:     Exiting...
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: crm_xml_cleanup:       
Cleaning up memory from libxml2
Aug 08 17:18:46 [30364] KNTCLFS001        cib:     info: main:  Done
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:     info: pcmk_child_exit:       
Child process cib exited (pid=30364, rc=0)
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: pcmk_shutdown_worker:  
Shutdown complete
Aug 08 17:18:46 [30360] KNTCLFS001 pacemakerd:   notice: pcmk_shutdown_worker:  
Attempting to inhibit respawning after fatal error

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Yount, William D
Sent: Wednesday, August 08, 2012 2:52 AM
To: [email protected]
Subject: [Linux-HA] Can't start pacemaker

I am following with the "Clusters from Scratch" guide to setup a cluster on two 
CentOS 6.3 boxes.  I am at the part where corosync is started and working 
correctly on both nodes. When I try to start pacemaker on either node, it keeps 
failing. Here is the output from strace:

stat("/etc/init.d/pacemaker", {st_mode=S_IFREG|0755, st_size=2543, ...}) = 0 
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 stat(".", {st_mode=S_IFDIR|0555, 
st_size=4096, ...}) = 0
stat("/sbin/env", 0x7fff9f8d6bb0)       = -1 ENOENT (No such file or directory)
stat("/usr/sbin/env", 0x7fff9f8d6bb0)   = -1 ENOENT (No such file or directory)
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0
geteuid()                               = 0
getegid()                               = 0
getuid()                                = 0
getgid()                                = 0
access("/bin/env", X_OK)                = 0
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0
geteuid()                               = 0
getegid()                               = 0
getuid()                                = 0
getgid()                                = 0
access("/bin/env", R_OK)                = 0
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0 
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0
geteuid()                               = 0
getegid()                               = 0
getuid()                                = 0
getgid()                                = 0
access("/bin/env", X_OK)                = 0
stat("/bin/env", {st_mode=S_IFREG|0755, st_size=23832, ...}) = 0
geteuid()                               = 0
getegid()                               = 0
getuid()                                = 0
getgid()                                = 0
access("/bin/env", R_OK)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 rt_sigprocmask(SIG_BLOCK, 
[CHLD], [INT CHLD], 8) = 0 rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0 
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f48bced69d0) = 3949 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 
0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], 
NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, 
{0x43d060, [], SA_RESTORER, 0x7f48bc53a920}, {SIG_DFL, [], SA_RESTORER, 
0x7f48bc53a920}, 8) = 0
wait4(-1, Starting Pacemaker Cluster Manager:              [FAILED]
[{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 3949 
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff9f8d671c, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f48bc53a920}, {0x43d060, [], 
SA_RESTORER, 0x7f48bc53a920}, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "", 1694)                     = 0
exit_group(1)                           = ?

I see the "no such file or directory" messages but I am not sure what impact 
that has on the application. I have been noticing that corosync spikes up to 
100% cpu usage;  makes the entire system sluggish. Here are software versions:
centos-release-6-3.el6.centos.9.x86_64
corosynclib-1.4.1-7.el6.x86_64
corosync-1.4.1-7.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64



William

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Can't start pacemaker

Reply via email to