Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies
Please post to the clusterlabs - users list. This list is deprecated. http://clusterlabs.org/mailman/listinfo/users digimer On 23/07/16 02:53 AM, Eric Robinson wrote: > I've created a 15 or so Corosync+Pacemaker clusters and never had this kind > of issue. > > These servers are running the following software > > RHEL 6.3 > pacemaker-libs-1.1.12-8.el6_7.2.x86_64 > pacemaker-1.1.12-8.el6_7.2.x86_64 > corosync-1.4.7-5.el6.x86_64 > pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 > pacemaker-cli-1.1.12-8.el6_7.2.x86_64 > corosynclib-1.4.7-5.el6.x86_64 > crmsh-2.0-1.el6.x86_64 > > Corosync starts fine and both nodes join the cluster. > Pacemaker appears to start fine, but 'crm configure show' produces the > error... > > [root@ha14b ~]# crm configure show > ERROR: running cibadmin -Ql: Could not establish cib_rw connection: > Connection refused (111) > Signon to CIB failed: Transport endpoint is not connected > Init failed, could not perform requested operations > ERROR: configure: Missing requirements > > After a short while Pacemaker dies... > > [root@ha14b ~]# service pacemaker status > pacemakerd dead but pid file exists > > The Pacemaker log shows the following... > > [root@ha14a log]# cat pacemaker.log > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_ipc_connect: Could > not establish pacemakerd connection: Connection refused (111) > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: > Processing additional service options... > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > 'pacemaker' for option: name > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > '1' for option: ver > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_cluster_type: > Detected an active 'classic openais (with plugin)' cluster > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: mcp_read_config: > Reading configure for stack: classic openais (with plugin) > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: > Processing additional service options... > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > 'pacemaker' for option: name > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > '1' for option: ver > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: > Defaulting to 'no' for option: use_logd > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: > Defaulting to 'no' for option: use_mgmtd > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: > Processing additional logging options... > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > 'off' for option: debug > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > 'yes' for option: to_logfile > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > '/var/log/corosync.log' for option: logfile > Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: crm_add_logfile: > Additional logging available in /var/log/corosync.log > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found > 'yes' for option: to_syslog > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: > Defaulting to 'daemon' for option: syslog_facility > Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: main:Starting > Pacemaker 1.1.11 (Build: 97629de): generated-manpages agent-manpages > ascii-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman acls > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: main:Maximum core > file size is: 18446744073709551615 > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: qb_ipcs_us_publish: > server name: pacemakerd > Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: get_node_name: Could > not obtain a node name for classic openais (with plugin) nodeid 688433344 > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer: > Created entry 503d43d2-c016-4537-97b6-8f0dcfc5384d/0x1a0 for node > (null)/688433344 (1 total) > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer: > Cannot obtain a UUID for node 688433344/(null) > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_update_peer_proc: > cluster_connect_cpg: Node (null)[688433344] - corosync-cpg is now online > Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: get_node_name: > Defaulting to uname -n for the local classic openais (with plugin) node name > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node > 688433344 is now known as ha14a > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node > 688433344 has uuid ha14a > Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Using > uid=189 and group=189
Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies
> I've seen very interesting behaviours after mistyping netmasks in various > places: iptables rules, interface configs, etc. Thanks for the thought. Iptables is off. If configs are correct. I don't see any place where the masks are wrong. --Eric ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: [Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies
On 7/23/2016 1:53 AM, Eric Robinson wrote: I've created a 15 or so Corosync+Pacemaker clusters and never had this kind of issue. I've seen very interesting behaviours after mistyping netmasks in various places: iptables rules, interface configs, etc. FWIW Dima ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
[Linux-HA] After Startup, Can't Connect to CIB, Pacemaker Eventually Dies
I've created a 15 or so Corosync+Pacemaker clusters and never had this kind of issue. These servers are running the following software RHEL 6.3 pacemaker-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 corosync-1.4.7-5.el6.x86_64 pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-cli-1.1.12-8.el6_7.2.x86_64 corosynclib-1.4.7-5.el6.x86_64 crmsh-2.0-1.el6.x86_64 Corosync starts fine and both nodes join the cluster. Pacemaker appears to start fine, but 'crm configure show' produces the error... [root@ha14b ~]# crm configure show ERROR: running cibadmin -Ql: Could not establish cib_rw connection: Connection refused (111) Signon to CIB failed: Transport endpoint is not connected Init failed, could not perform requested operations ERROR: configure: Missing requirements After a short while Pacemaker dies... [root@ha14b ~]# service pacemaker status pacemakerd dead but pid file exists The Pacemaker log shows the following... [root@ha14a log]# cat pacemaker.log Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111) Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: Processing additional service options... Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found 'pacemaker' for option: name Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found '1' for option: ver Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_cluster_type: Detected an active 'classic openais (with plugin)' cluster Jul 22 23:29:45 [4616] ha14a pacemakerd: info: mcp_read_config: Reading configure for stack: classic openais (with plugin) Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: Processing additional service options... Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found 'pacemaker' for option: name Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found '1' for option: ver Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Defaulting to 'no' for option: use_logd Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Defaulting to 'no' for option: use_mgmtd Jul 22 23:29:45 [4616] ha14a pacemakerd: info: config_find_next: Processing additional logging options... Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found 'off' for option: debug Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found 'yes' for option: to_logfile Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found '/var/log/corosync.log' for option: logfile Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: crm_add_logfile: Additional logging available in /var/log/corosync.log Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Found 'yes' for option: to_syslog Jul 22 23:29:45 [4616] ha14a pacemakerd: info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: main:Starting Pacemaker 1.1.11 (Build: 97629de): generated-manpages agent-manpages ascii-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman acls Jul 22 23:29:45 [4616] ha14a pacemakerd: info: main:Maximum core file size is: 18446744073709551615 Jul 22 23:29:45 [4616] ha14a pacemakerd: info: qb_ipcs_us_publish: server name: pacemakerd Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 688433344 Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Created entry 503d43d2-c016-4537-97b6-8f0dcfc5384d/0x1a0 for node (null)/688433344 (1 total) Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Cannot obtain a UUID for node 688433344/(null) Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[688433344] - corosync-cpg is now online Jul 22 23:29:45 [4616] ha14a pacemakerd: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 688433344 is now known as ha14a Jul 22 23:29:45 [4616] ha14a pacemakerd: info: crm_get_peer:Node 688433344 has uuid ha14a Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Using uid=189 and group=189 for process cib Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked child 4622 for process cib Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked child 4623 for process stonith-ng Jul 22 23:29:45 [4616] ha14a pacemakerd: info: start_child: Forked child 4624 for process lrmd Jul 22 23:29:45