[Linux-HA] problem with pgsql streaming resource agent

Jeff Frost Mon, 08 Jul 2013 10:41:37 -0700

We're testing out the pgsql master slave streaming replication resource agent 
that's found here:


https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql

and using the example 2-node configuration found here 
https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
as a template, we came up with the following configuration:


node node1
node node2
primitive pgsql ocf:heartbeat:pgsql \
  params pgctl="/usr/pgsql-9.2/bin/pg_ctl" psql="/usr/pgsql-9.2/bin/psql" 
pgdata="/var/lib/pgsql/9.2/data/" start_opt="-p 5432" rep_mode="async" 
node_list="node1 node2" repuser="replicauser" restore_command="rsync -aq 
/var/lib/pgsql/wal_archive/%f %p" master_ip="192.168.253.104" stop_escalate="0" 
\
  op start interval="0s" role="Master" timeout="60s" on-fail="block"
primitive vip-master ocf:heartbeat:IPaddr2 \
  params ip="192.168.254.104" nic="eth0" cidr_netmask="24" \
  op start interval="0s" timeout="60s" on-fail="block"
primitive vip-rep ocf:heartbeat:IPaddr2 \
  params ip="192.168.253.104" nic="eth1" cidr_netmask="24" 
migration-threshold="0" \
  op start interval="0s" timeout="60s" on-fail="block"
group master-group vip-master vip-rep
ms msPostgresql pgsql \
  meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" 
notify="true"
colocation colocation-master-group-msPostgresql-INFINITY inf: master-group 
msPostgresql:Master
order order-msPostgresql-master-group-mandatory : msPostgresql:promote 
master-group:start symmetrical=false
property $id="cib-bootstrap-options" \
  dc-version="1.1.8-7.el6-394e906" \
  cluster-infrastructure="cman" \
  no-quorum-policy="ignore" \
  stonith-enabled="false" \
  migration-threshold="1"
rsc_defaults $id="rsc_defaults-options" \
  resource-stickiness="INFINITY" \
  migration-threshold="1"

Unfortunately, when the cluster is started up, we end up with two slaves and 
neither gets promoted:

Last updated: Mon Jul  8 17:29:10 2013
Last change: Mon Jul  8 17:28:28 2013 via cibadmin on node1
Stack: cman
Current DC: node1 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
4 Resources configured.


Online: [ node1 node2 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Slaves: [ node1 node2 ]
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2): Stopped
     vip-rep  (ocf::heartbeat:IPaddr2): Stopped

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-status                      : HS:alone
* Node node2:
    + master-pgsql                      : -INFINITY
    + pgsql-status                      : HS:alone

Migration summary:
* Node node1:
* Node node2:

Here are the various package versions:

rpm -q pacemaker crmsh resource-agents cman corosync
pacemaker-1.1.8-7.el6.x86_64
crmsh-1.2.5-55.8.x86_64
resource-agents-3.9.2-21.el6_4.3.x86_64
cman-3.0.12.1-49.el6.x86_64
corosync-1.4.1-15.el6_4.1.x86_64

I initially configured this with pcs, but installed crmsh to double check my 
crm-=>pcs translations.

I installed the pgsql resource-agent by hand from the git repo.  I tried a few 
different commits back as well as the latest in case there was an issue.

Any insight would be much appreciated.


Here's "grep pgsql /var/log/messages":

Jul  8 17:28:28 node1 cibadmin[16121]:   notice: crm_log_args: Invoked: 
/usr/sbin/cibadmin --replace --xml-file pgsql_cfg
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++         <primitive 
class="ocf" id="pgsql" provider="heartbeat" type="pgsql" >
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++           
<instance_attributes id="pgsql-instance_attributes" >
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-pgctl" name="pgctl" 
value="/usr/pgsql-9.2/bin/pg_ctl" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-psql" name="psql" value="/usr/pgsql-9.2/bin/psql" 
/>
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-pgdata" name="pgdata" 
value="/var/lib/pgsql/9.2/data/" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-start_opt" name="start_opt" value="-p 5432" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-rep_mode" name="rep_mode" value="async" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-node_list" name="node_list" value="node1 node2" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-repuser" name="repuser" value="replicauser" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-restore_command" name="restore_command" 
value="rsync -aq /var/lib/pgsql/wal_archive/%f %p" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-master_ip" name="master_ip" 
value="192.168.253.104" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <nvpair 
id="pgsql-instance_attributes-stop_escalate" name="stop_escalate" value="0" />
Jul  8 17:28:28 node1 cib[22549]:   notice: cib:diff: ++             <op 
id="pgsql-interval-0s" interval="0s" name="start" on-fail="block" role="Master" 
timeout="60s" />
Jul  8 17:28:28 node1 attrd[22552]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-pgsql (-INFINITY)
Jul  8 17:28:28 node1 attrd[22552]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (UNKNOWN)
Jul  8 17:28:28 node1 pengine[22553]:   notice: LogActions: Start   
pgsql:0#011(node1)
Jul  8 17:28:28 node1 pengine[22553]:   notice: LogActions: Start   
pgsql:1#011(node2)
Jul  8 17:28:28 node1 attrd[22552]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (STOP)
Jul  8 17:28:28 node1 attrd[22552]:   notice: attrd_perform_update: Sent update 
1054: pgsql-status=STOP
Jul  8 17:28:29 node1 attrd[22552]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (HS:alone)
Jul  8 17:28:29 node1 attrd[22552]:   notice: attrd_perform_update: Sent update 
1064: pgsql-status=HS:alone
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: pgsql_replication_start ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: Changing pgsql-status on node1 
: UNKNOWN->STOP. ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: server starting ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: PostgreSQL start command sent. ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ psql: could not connect to server: No such file or 
directory ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ #011Is the server running locally and accepting ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ #011connections on Unix domain socket 
"/tmp/.s.PGSQL.5432"? ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 WARNING: Can't get PostgreSQL 
recovery status. rc=2 ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:28 WARNING: Connection error (connection 
to the server went bad and the session was not interactive) occurred while 
executing the psql command. ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:29 INFO: PostgreSQL is started. ]
Jul  8 17:28:29 node1 lrmd[22551]:   notice: operation_finished: 
pgsql_start_0:16123 [ 2013/07/08_17:28:29 INFO: Changing pgsql-status on node1 
: STOP->HS:alone. ]
Jul  8 17:28:30 node1 crmd[22554]:   notice: process_lrm_event: LRM operation 
pgsql_start_0 (call=160, rc=0, cib-update=1105, confirmed=true) ok
Jul  8 17:28:30 node1 crmd[22554]:   notice: process_lrm_event: LRM operation 
pgsql_notify_0 (call=163, rc=0, cib-update=0, confirmed=true) ok


Jul  8 17:28:28 node2 attrd[19741]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: master-pgsql (-INFINITY)
Jul  8 17:28:28 node2 attrd[19741]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (UNKNOWN)
Jul  8 17:28:28 node2 attrd[19741]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (STOP)
Jul  8 17:28:28 node2 attrd[19741]:   notice: attrd_perform_update: Sent update 
1252: pgsql-status=STOP
Jul  8 17:28:29 node2 attrd[19741]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: pgsql-status (HS:alone)
Jul  8 17:28:29 node2 attrd[19741]:   notice: attrd_perform_update: Sent update 
1256: pgsql-status=HS:alone
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: pgsql_replication_start ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: Changing pgsql-status on node2 
: UNKNOWN->STOP. ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: server starting ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: PostgreSQL start command sent. ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ psql: could not connect to server: No such file or 
directory ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ #011Is the server running locally and accepting ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ #011connections on Unix domain socket 
"/tmp/.s.PGSQL.5432"? ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 WARNING: Can't get PostgreSQL 
recovery status. rc=2 ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:28 WARNING: Connection error (connection 
to the server went bad and the session was not interactive) occurred while 
executing the psql command. ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:29 INFO: PostgreSQL is started. ]
Jul  8 17:28:29 node2 lrmd[19740]:   notice: operation_finished: 
pgsql_start_0:10626 [ 2013/07/08_17:28:29 INFO: Changing pgsql-status on node2 
: STOP->HS:alone. ]
Jul  8 17:28:30 node2 crmd[19743]:   notice: process_lrm_event: LRM operation 
pgsql_start_0 (call=118, rc=0, cib-update=145, confirmed=true) ok
Jul  8 17:28:30 node2 crmd[19743]:   notice: process_lrm_event: LRM operation 
pgsql_notify_0 (call=121, rc=0, cib-update=0, confirmed=true) ok

Finally, I confirmed that streaming replication works with these hosts if set 
up outside the context of pacemaker.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] problem with pgsql streaming resource agent

Reply via email to