I'm currently a bit struggling with setting up a PostgreSQL 9.3 HA
Master/Slave Cluster using CentOS 7 (Corosync 2 and Pacemaker 1.1.10).
Please note that I've deliberately stopped node2 currently in order to keep
the scenario simpler (I hope).
After starting the cluster on node1, crm_mon shows the following:
Stack: corosync
Current DC: node1 (1) - partition WITHOUT quorum
Version: 1.1.10-32.el7_0-368c726
2 Nodes configured
4 Resources configured
Online: [ node1 ]
OFFLINE: [ node2 ]
Full list of resources:
Master/Slave Set: pgsql_master_slave [pgsql]
Stopped: [ node1 node2 ]
Resource Group: master-group
pgsql_vip_rep (ocf::heartbeat:IPaddr2): Stopped
pgsql_forward_listen_port (ocf::heartbeat:portforward): Stopped
Node Attributes:
* Node node1:
+ master-pgsql : -INFINITY
+ pgsql-status : STOP
Migration summary:
* Node node1:
pgsql: migration-threshold=1 fail-count=1000000 last-failure='Thu Sep 18
11:39:06 2014'
Failed actions:
pgsql_start_0 on node1 'unknown error' (1): call=15, status=Timed Out,
last-rc-change='Thu Sep 18 11:38:05 2014', qu
eued=60028ms, exec=0ms
After running the following commands:
rm -f /var/lib/pgsql/9.3/data/recovery.conf
rm -f /var/lib/pgsql/9.3/data/ra_tmp/PGSQL.lock
I've verified that postgres can be started manually using:
systemctl start postgresql-9.3
Which is not the point, of course. But I wanted to at least make sure that
the pgsql configuration is not totally hosed.
I then tried the following on node1 (node2 is still switched off as
mentioned before):
rm -f /var/lib/pgsql/9.3/data/recovery.conf
rm -f /var/lib/pgsql/9.3/data/ra_tmp/PGSQL.lock
crm_attribute -l reboot -N $(uname -n) -n "pgsql-data-status" -v "LATEST"
crm_attribute -l reboot -N $(uname -n) -n "master-pgsql" -v "1000"
pcs resource cleanup pgsql
pcs resource cleanup pgsql_master_slave
pcs resource cleanup master-group
Which has the effect to briefly change crm_mon output to this:
Online: [ node1 ]
OFFLINE: [ node2 ]
Full list of resources:
Master/Slave Set: pgsql_master_slave [pgsql]
Stopped: [ node1 node2 ]
Resource Group: master-group
pgsql_vip_rep (ocf::heartbeat:IPaddr2): Stopped
pgsql_forward_listen_port (ocf::heartbeat:portforward): Stopped
Node Attributes:
* Node node1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-status : STOP
Migration summary:
* Node node1:
As soon as the last resource cleanup is done, the situation reverts to the
first "picture". I'm really running out of ideas here. Any suggestions?
--
View this message in context:
http://linux-ha.996297.n3.nabble.com/Unable-to-start-any-node-of-pgsql-Master-Slave-Cluster-tp15816.html
Sent from the Linux-HA mailing list archive at Nabble.com.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems