Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies

2016-07-23 Thread Digimer
Please post to the clusterlabs - users list. This list is deprecated.

http://clusterlabs.org/mailman/listinfo/users

digimer

On 23/07/16 02:53 AM, Eric Robinson wrote:
> I've created a 15 or so Corosync+Pacemaker clusters and never had this kind 
> of issue.
> 
> These servers are running the following software
> 
> RHEL 6.3
> pacemaker-libs-1.1.12-8.el6_7.2.x86_64
> pacemaker-1.1.12-8.el6_7.2.x86_64
> corosync-1.4.7-5.el6.x86_64
> pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64
> pacemaker-cli-1.1.12-8.el6_7.2.x86_64
> corosynclib-1.4.7-5.el6.x86_64
> crmsh-2.0-1.el6.x86_64
> 
> Corosync starts fine and both nodes join the cluster.
> Pacemaker appears to start fine, but 'crm configure show' produces the 
> error...
> 
> [root@ha14b ~]# crm configure show
> ERROR: running cibadmin -Ql: Could not establish cib_rw connection: 
> Connection refused (111)
> Signon to CIB failed: Transport endpoint is not connected
> Init failed, could not perform requested operations
> ERROR: configure: Missing requirements
> 
> After a short while Pacemaker dies...
> 
> [root@ha14b ~]# service pacemaker status
> pacemakerd dead but pid file exists
> 
> The Pacemaker log shows the following...
> 
> [root@ha14a log]# cat pacemaker.log
> Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_ipc_connect: Could 
> not establish pacemakerd connection: Connection refused (111)
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
> Processing additional service options...
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> 'pacemaker' for option: name
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> '1' for option: ver
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_cluster_type:
> Detected an active 'classic openais (with plugin)' cluster
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: mcp_read_config: 
> Reading configure for stack: classic openais (with plugin)
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
> Processing additional service options...
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> 'pacemaker' for option: name
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> '1' for option: ver
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
> Defaulting to 'no' for option: use_logd
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
> Defaulting to 'no' for option: use_mgmtd
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
> Processing additional logging options...
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> 'off' for option: debug
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> 'yes' for option: to_logfile
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> '/var/log/corosync.log' for option: logfile
> Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: crm_add_logfile: 
> Additional logging available in /var/log/corosync.log
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
> 'yes' for option: to_syslog
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
> Defaulting to 'daemon' for option: syslog_facility
> Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: main:Starting 
> Pacemaker 1.1.11 (Build: 97629de):  generated-manpages agent-manpages 
> ascii-docs ncurses libqb-logging libqb-ipc nagios  corosync-plugin cman acls
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: main:Maximum core 
> file size is: 18446744073709551615
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: qb_ipcs_us_publish:  
> server name: pacemakerd
> Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: get_node_name:   Could 
> not obtain a node name for classic openais (with plugin) nodeid 688433344
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:
> Created entry 503d43d2-c016-4537-97b6-8f0dcfc5384d/0x1a0 for node 
> (null)/688433344 (1 total)
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:
> Cannot obtain a UUID for node 688433344/(null)
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_update_peer_proc:  
>   cluster_connect_cpg: Node (null)[688433344] - corosync-cpg is now online
> Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: get_node_name:   
> Defaulting to uname -n for the local classic openais (with plugin) node name
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 
> 688433344 is now known as ha14a
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 
> 688433344 has uuid ha14a
> Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Using 
> uid=189 and group=189 

Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies

2016-07-23 Thread Eric Robinson
> I've seen very interesting behaviours after mistyping netmasks in various 
> places: iptables rules, interface configs, etc.

Thanks for the thought. Iptables is off. If configs are correct. I don't see 
any place where the masks are wrong. 

--Eric

___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies

2016-07-23 Thread Dmitri Maziuk

On 7/23/2016 1:53 AM, Eric Robinson wrote:

I've created a 15 or so Corosync+Pacemaker clusters and never had this kind of 
issue.


I've seen very interesting behaviours after mistyping netmasks in 
various places: iptables rules, interface configs, etc.


FWIW
Dima


___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


[Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies

2016-07-23 Thread Eric Robinson
I've created a 15 or so Corosync+Pacemaker clusters and never had this kind of 
issue.

These servers are running the following software

RHEL 6.3
pacemaker-libs-1.1.12-8.el6_7.2.x86_64
pacemaker-1.1.12-8.el6_7.2.x86_64
corosync-1.4.7-5.el6.x86_64
pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64
pacemaker-cli-1.1.12-8.el6_7.2.x86_64
corosynclib-1.4.7-5.el6.x86_64
crmsh-2.0-1.el6.x86_64

Corosync starts fine and both nodes join the cluster.
Pacemaker appears to start fine, but 'crm configure show' produces the error...

[root@ha14b ~]# crm configure show
ERROR: running cibadmin -Ql: Could not establish cib_rw connection: Connection 
refused (111)
Signon to CIB failed: Transport endpoint is not connected
Init failed, could not perform requested operations
ERROR: configure: Missing requirements

After a short while Pacemaker dies...

[root@ha14b ~]# service pacemaker status
pacemakerd dead but pid file exists

The Pacemaker log shows the following...

[root@ha14a log]# cat pacemaker.log
Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_ipc_connect: Could 
not establish pacemakerd connection: Connection refused (111)
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
Processing additional service options...
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'pacemaker' for option: name
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'1' for option: ver
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_cluster_type:
Detected an active 'classic openais (with plugin)' cluster
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: mcp_read_config: Reading 
configure for stack: classic openais (with plugin)
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
Processing additional service options...
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'pacemaker' for option: name
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'1' for option: ver
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
Defaulting to 'no' for option: use_logd
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
Defaulting to 'no' for option: use_mgmtd
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next:
Processing additional logging options...
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'off' for option: debug
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'yes' for option: to_logfile
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'/var/log/corosync.log' for option: logfile
Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: crm_add_logfile: 
Additional logging available in /var/log/corosync.log
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  Found 
'yes' for option: to_syslog
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt:  
Defaulting to 'daemon' for option: syslog_facility
Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: main:Starting 
Pacemaker 1.1.11 (Build: 97629de):  generated-manpages agent-manpages 
ascii-docs ncurses libqb-logging libqb-ipc nagios  corosync-plugin cman acls
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: main:Maximum core 
file size is: 18446744073709551615
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: qb_ipcs_us_publish:  server 
name: pacemakerd
Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: get_node_name:   Could 
not obtain a node name for classic openais (with plugin) nodeid 688433344
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Created 
entry 503d43d2-c016-4537-97b6-8f0dcfc5384d/0x1a0 for node (null)/688433344 
(1 total)
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Cannot 
obtain a UUID for node 688433344/(null)
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_update_peer_proc:
cluster_connect_cpg: Node (null)[688433344] - corosync-cpg is now online
Jul 22 23:29:45 [4616] ha14a pacemakerd:   notice: get_node_name:   
Defaulting to uname -n for the local classic openais (with plugin) node name
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 
688433344 is now known as ha14a
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 
688433344 has uuid ha14a
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Using 
uid=189 and group=189 for process cib
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked 
child 4622 for process cib
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked 
child 4623 for process stonith-ng
Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked 
child 4624 for process lrmd
Jul 22 23:29:45