[Linux-HA] Unable to start any node of pgsql Master/Slave Cluster

Oliver Weichhold Thu, 18 Sep 2014 08:51:29 -0700

I'm currently a bit struggling with setting up a PostgreSQL 9.3 HA
Master/Slave Cluster using CentOS 7 (Corosync 2 and Pacemaker 1.1.10).


Please note that I've deliberately stopped node2 currently in order to keep
the scenario simpler (I hope).

After starting the cluster on node1, crm_mon shows the following:

Stack: corosync
Current DC: node1 (1) - partition WITHOUT quorum
Version: 1.1.10-32.el7_0-368c726
2 Nodes configured
4 Resources configured

Online: [ node1 ]
OFFLINE: [ node2 ]

Full list of resources:

 Master/Slave Set: pgsql_master_slave [pgsql]
     Stopped: [ node1 node2 ]
 Resource Group: master-group
     pgsql_vip_rep      (ocf::heartbeat:IPaddr2):       Stopped
     pgsql_forward_listen_port  (ocf::heartbeat:portforward):   Stopped

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-status                      : STOP

Migration summary:
* Node node1:
   pgsql: migration-threshold=1 fail-count=1000000 last-failure='Thu Sep 18
11:39:06 2014'

Failed actions:
    pgsql_start_0 on node1 'unknown error' (1): call=15, status=Timed Out,
last-rc-change='Thu Sep 18 11:38:05 2014', qu
eued=60028ms, exec=0ms

After running the following commands:

rm -f /var/lib/pgsql/9.3/data/recovery.conf 
rm -f /var/lib/pgsql/9.3/data/ra_tmp/PGSQL.lock 

I've verified that postgres can be started manually using:

systemctl start postgresql-9.3

Which is not the point, of course. But I wanted to at least make sure that
the pgsql configuration is not totally hosed. 

I then tried the following on node1 (node2 is still switched off as
mentioned before):

rm -f /var/lib/pgsql/9.3/data/recovery.conf 
rm -f /var/lib/pgsql/9.3/data/ra_tmp/PGSQL.lock 
crm_attribute -l reboot -N $(uname -n) -n "pgsql-data-status" -v "LATEST" 
crm_attribute -l reboot -N $(uname -n) -n "master-pgsql" -v "1000" 
pcs resource cleanup pgsql
pcs resource cleanup pgsql_master_slave 
pcs resource cleanup master-group 

Which has the effect to briefly change crm_mon output to this:

Online: [ node1 ]
OFFLINE: [ node2 ]

Full list of resources:

 Master/Slave Set: pgsql_master_slave [pgsql]
     Stopped: [ node1 node2 ]
 Resource Group: master-group
     pgsql_vip_rep      (ocf::heartbeat:IPaddr2):       Stopped
     pgsql_forward_listen_port  (ocf::heartbeat:portforward):   Stopped

Node Attributes:
* Node node1:
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-status                      : STOP

Migration summary:
* Node node1:

As soon as the last resource cleanup is done, the situation reverts to the
first "picture". I'm really running out of ideas here. Any suggestions?



--
View this message in context: 
http://linux-ha.996297.n3.nabble.com/Unable-to-start-any-node-of-pgsql-Master-Slave-Cluster-tp15816.html
Sent from the Linux-HA mailing list archive at Nabble.com.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Unable to start any node of pgsql Master/Slave Cluster

Reply via email to