We're testing out the pgsql master slave streaming replication resource agent that's found here:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql and using the example 2-node configuration found here https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication as a template, we came up with the following configuration: node node1 node node2 primitive pgsql ocf:heartbeat:pgsql \ params pgctl="/usr/pgsql-9.2/bin/pg_ctl" psql="/usr/pgsql-9.2/bin/psql" pgdata="/var/lib/pgsql/9.2/data/" start_opt="-p 5432" rep_mode="async" node_list="node1 node2" repuser="replicauser" restore_command="rsync -aq /var/lib/pgsql/wal_archive/%f %p" master_ip="192.168.253.104" stop_escalate="0" \ op start interval="0s" role="Master" timeout="60s" on-fail="block" primitive vip-master ocf:heartbeat:IPaddr2 \ params ip="192.168.254.104" nic="eth0" cidr_netmask="24" \ op start interval="0s" timeout="60s" on-fail="block" primitive vip-rep ocf:heartbeat:IPaddr2 \ params ip="192.168.253.104" nic="eth1" cidr_netmask="24" migration-threshold="0" \ op start interval="0s" timeout="60s" on-fail="block" group master-group vip-master vip-rep ms msPostgresql pgsql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation colocation-master-group-msPostgresql-INFINITY inf: master-group msPostgresql:Master order order-msPostgresql-master-group-mandatory : msPostgresql:promote master-group:start symmetrical=false property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="cman" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ migration-threshold="1" rsc_defaults $id="rsc_defaults-options" \ resource-stickiness="INFINITY" \ migration-threshold="1" Unfortunately, when the cluster is started up, we end up with two slaves and neither gets promoted: Last updated: Mon Jul 8 17:29:10 2013 Last change: Mon Jul 8 17:28:28 2013 via cibadmin on node1 Stack: cman Current DC: node1 - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 4 Resources configured. Online: [ node1 node2 ] Full list of resources: Master/Slave Set: msPostgresql [pgsql] Slaves: [ node1 node2 ] Resource Group: master-group vip-master (ocf::heartbeat:IPaddr2): Stopped vip-rep (ocf::heartbeat:IPaddr2): Stopped Node Attributes: * Node node1: + master-pgsql : -INFINITY + pgsql-status : HS:alone * Node node2: + master-pgsql : -INFINITY + pgsql-status : HS:alone Migration summary: * Node node1: * Node node2: Here are the various package versions: rpm -q pacemaker crmsh resource-agents cman corosync pacemaker-1.1.8-7.el6.x86_64 crmsh-1.2.5-55.8.x86_64 resource-agents-3.9.2-21.el6_4.3.x86_64 cman-3.0.12.1-49.el6.x86_64 corosync-1.4.1-15.el6_4.1.x86_64 I initially configured this with pcs, but installed crmsh to double check my crm-=>pcs translations. I installed the pgsql resource-agent by hand from the git repo. I tried a few different commits back as well as the latest in case there was an issue. Any insight would be much appreciated. Here's "grep pgsql /var/log/messages": Jul 8 17:28:28 node1 cibadmin[16121]: notice: crm_log_args: Invoked: /usr/sbin/cibadmin --replace --xml-file pgsql_cfg Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <primitive class="ocf" id="pgsql" provider="heartbeat" type="pgsql" > Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <instance_attributes id="pgsql-instance_attributes" > Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-pgctl" name="pgctl" value="/usr/pgsql-9.2/bin/pg_ctl" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-psql" name="psql" value="/usr/pgsql-9.2/bin/psql" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-pgdata" name="pgdata" value="/var/lib/pgsql/9.2/data/" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-start_opt" name="start_opt" value="-p 5432" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-rep_mode" name="rep_mode" value="async" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-node_list" name="node_list" value="node1 node2" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-repuser" name="repuser" value="replicauser" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-restore_command" name="restore_command" value="rsync -aq /var/lib/pgsql/wal_archive/%f %p" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-master_ip" name="master_ip" value="192.168.253.104" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <nvpair id="pgsql-instance_attributes-stop_escalate" name="stop_escalate" value="0" /> Jul 8 17:28:28 node1 cib[22549]: notice: cib:diff: ++ <op id="pgsql-interval-0s" interval="0s" name="start" on-fail="block" role="Master" timeout="60s" /> Jul 8 17:28:28 node1 attrd[22552]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-pgsql (-INFINITY) Jul 8 17:28:28 node1 attrd[22552]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (UNKNOWN) Jul 8 17:28:28 node1 pengine[22553]: notice: LogActions: Start pgsql:0#011(node1) Jul 8 17:28:28 node1 pengine[22553]: notice: LogActions: Start pgsql:1#011(node2) Jul 8 17:28:28 node1 attrd[22552]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (STOP) Jul 8 17:28:28 node1 attrd[22552]: notice: attrd_perform_update: Sent update 1054: pgsql-status=STOP Jul 8 17:28:29 node1 attrd[22552]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (HS:alone) Jul 8 17:28:29 node1 attrd[22552]: notice: attrd_perform_update: Sent update 1064: pgsql-status=HS:alone Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: pgsql_replication_start ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: Changing pgsql-status on node1 : UNKNOWN->STOP. ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: server starting ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 INFO: PostgreSQL start command sent. ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ psql: could not connect to server: No such file or directory ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ #011Is the server running locally and accepting ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ #011connections on Unix domain socket "/tmp/.s.PGSQL.5432"? ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 WARNING: Can't get PostgreSQL recovery status. rc=2 ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:28 WARNING: Connection error (connection to the server went bad and the session was not interactive) occurred while executing the psql command. ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:29 INFO: PostgreSQL is started. ] Jul 8 17:28:29 node1 lrmd[22551]: notice: operation_finished: pgsql_start_0:16123 [ 2013/07/08_17:28:29 INFO: Changing pgsql-status on node1 : STOP->HS:alone. ] Jul 8 17:28:30 node1 crmd[22554]: notice: process_lrm_event: LRM operation pgsql_start_0 (call=160, rc=0, cib-update=1105, confirmed=true) ok Jul 8 17:28:30 node1 crmd[22554]: notice: process_lrm_event: LRM operation pgsql_notify_0 (call=163, rc=0, cib-update=0, confirmed=true) ok Jul 8 17:28:28 node2 attrd[19741]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-pgsql (-INFINITY) Jul 8 17:28:28 node2 attrd[19741]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (UNKNOWN) Jul 8 17:28:28 node2 attrd[19741]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (STOP) Jul 8 17:28:28 node2 attrd[19741]: notice: attrd_perform_update: Sent update 1252: pgsql-status=STOP Jul 8 17:28:29 node2 attrd[19741]: notice: attrd_trigger_update: Sending flush op to all hosts for: pgsql-status (HS:alone) Jul 8 17:28:29 node2 attrd[19741]: notice: attrd_perform_update: Sent update 1256: pgsql-status=HS:alone Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: pgsql_replication_start ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: Changing pgsql-status on node2 : UNKNOWN->STOP. ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: server starting ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 INFO: PostgreSQL start command sent. ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ psql: could not connect to server: No such file or directory ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ #011Is the server running locally and accepting ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ #011connections on Unix domain socket "/tmp/.s.PGSQL.5432"? ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 WARNING: Can't get PostgreSQL recovery status. rc=2 ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:28 WARNING: Connection error (connection to the server went bad and the session was not interactive) occurred while executing the psql command. ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:29 INFO: PostgreSQL is started. ] Jul 8 17:28:29 node2 lrmd[19740]: notice: operation_finished: pgsql_start_0:10626 [ 2013/07/08_17:28:29 INFO: Changing pgsql-status on node2 : STOP->HS:alone. ] Jul 8 17:28:30 node2 crmd[19743]: notice: process_lrm_event: LRM operation pgsql_start_0 (call=118, rc=0, cib-update=145, confirmed=true) ok Jul 8 17:28:30 node2 crmd[19743]: notice: process_lrm_event: LRM operation pgsql_notify_0 (call=121, rc=0, cib-update=0, confirmed=true) ok Finally, I confirmed that streaming replication works with these hosts if set up outside the context of pacemaker. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
