1. Of course, I did it. For now postgresql replication is cleaned and used async both servers.
2013/12/13 Takehiro Matsushima <[email protected]> > 1. Well, it means rebuilding PostgreSQL replication cluster by using > pg_basebackup or rsync or something. > 2. Thanks, but I'll try fist. > > 2013/12/14 Andrey Rogovsky <[email protected]>: > > 1. You meant crm resource cleanup or something else? > > > > 2. If you want - I can give you logs. > > > > > > > > 2013/12/13 Takehiro Matsushima <[email protected]> > > > >> 1. Temporarily, how about cleanup completely all nodes once? like > >> master is "a", slaves are "b" and "c". > >> > >> 2. It looks like it caused by RA... umm... I'll try building a cluster > >> on Debian 7. > >> > >> 2013/12/14 Andrey Rogovsky <[email protected]>: > >> > 1. How I can find status in the log? What exactly I need search in? > >> > > >> > 2. I did it and have this situation: > >> > On a node: > >> > root@a:~# sudo -u postgres psql > >> > could not change directory to "/root": Permission denied > >> > psql (9.3.2) > >> > Type "help" for help. > >> > > >> > postgres=# select client_addr,sync_state from pg_stat_replication; > >> > client_addr | sync_state > >> > --------------+------------ > >> > 192.168.10.2 | async > >> > 192.168.10.3 | async > >> > (2 rows) > >> > > >> > So, pgsql is correct. But... > >> > root@a:~# crm_mon -VAf -1 > >> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing > >> > msPostgresql from re-starting on a.mydomain.com: operation monitor > >> failed > >> > 'invalid parameter' (rc=2) > >> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing > >> > msPostgresql from re-starting on b.mydomain.com: operation monitor > >> failed > >> > 'invalid parameter' (rc=2) > >> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing > >> > msPostgresql from re-starting on c.mydomain.com: operation monitor > >> failed > >> > 'invalid parameter' (rc=2) > >> > ============ > >> > Last updated: Fri Dec 13 22:15:30 2013 > >> > Last change: Fri Dec 13 20:48:18 2013 via crmd on c.mydomain.com > >> > Stack: openais > >> > Current DC: a.mydomain.com - partition with quorum > >> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff > >> > 3 Nodes configured, 3 expected votes > >> > 6 Resources configured. > >> > ============ > >> > > >> > Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ] > >> > > >> > apache-master-ip (ocf::heartbeat:IPaddr2): Started a.mydomain.com > >> > apache (ocf::heartbeat:apache): Started a.mydomain.com > >> > > >> > Node Attributes: > >> > * Node a.mydomain.com: > >> > + pgsql-data-status : LATEST > >> > * Node c.mydomain.com: > >> > + pgsql-data-status : STREAMING|ASYNC > >> > + pgsql-status : HS:async > >> > * Node b.mydomain.com: > >> > + pgsql-data-status : STREAMING|ASYNC > >> > + pgsql-status : HS:async > >> > > >> > Migration summary: > >> > * Node a.mydomain.com: > >> > * Node b.mydomain.com: > >> > * Node c.mydomain.com: > >> > > >> > Failed actions: > >> > pgsql:0_monitor_0 (node=a.mydomain.com, call=31, rc=2, > >> > status=complete): invalid parameter > >> > pgsql:0_monitor_0 (node=b.mydomain.com, call=26, rc=2, > >> > status=complete): invalid parameter > >> > pgsql:0_monitor_0 (node=c.mydomain.com, call=22, rc=2, > >> > status=complete): invalid parameter > >> > root@a:~# > >> > > >> > How I can fix it? > >> > > >> > > >> > > >> > 2013/12/13 Takehiro Matsushima <[email protected]> > >> > > >> >> 1. Excuse me, could you tell me status before a.mydomain.com fails? > >> >> > >> >> 2. Sorry, replace rep_mode="sync" with rep_mode="async" defined in > >> >> primitive pgsql. > >> >> > >> >> 2013/12/14 Andrey Rogovsky <[email protected]>: > >> >> > 1. If fall down: > >> >> > ============ > >> >> > Last updated: Fri Dec 13 19:06:51 2013 > >> >> > Last change: Fri Dec 13 10:06:49 2013 via cibadmin on > a.mydomain.com > >> >> > Stack: openais > >> >> > Current DC: c.mydomain.com - partition with quorum > >> >> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff > >> >> > 3 Nodes configured, 3 expected votes > >> >> > 6 Resources configured. > >> >> > ============ > >> >> > > >> >> > Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ] > >> >> > > >> >> > Full list of resources: > >> >> > > >> >> > Resource Group: master > >> >> > pgsql-master-ip (ocf::heartbeat:IPaddr2): Started > b.mydomain.com > >> >> > Master/Slave Set: msPostgresql [pgsql] > >> >> > Masters: [ b.mydomain.com ] > >> >> > Slaves: [ c.mydomain.com ] > >> >> > Stopped: [ pgsql:0 ] > >> >> > apache-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com > >> >> > apache (ocf::heartbeat:apache): Started b.mydomain.com > >> >> > > >> >> > Node Attributes: > >> >> > * Node a.mydomain.com: > >> >> > + master-pgsql:0 : -INFINITY > >> >> > + master-pgsql:1 : 1000 > >> >> > + pgsql-data-status : DISCONNECT > >> >> > + pgsql-status : STOP > >> >> > * Node c.mydomain.com: > >> >> > + master-pgsql:2 : 100 > >> >> > + pgsql-data-status : STREAMING|SYNC > >> >> > + pgsql-status : HS:sync > >> >> > * Node b.mydomain.com: > >> >> > + master-pgsql:0 : -INFINITY > >> >> > + master-pgsql:1 : 1000 > >> >> > + pgsql-data-status : LATEST > >> >> > + pgsql-master-baseline : 000000000F000090 > >> >> > + pgsql-status : PRI > >> >> > > >> >> > Migration summary: > >> >> > * Node a.mydomain.com: > >> >> > pgsql:0: migration-threshold=1 fail-count=1 > >> >> > * Node c.mydomain.com: > >> >> > * Node b.mydomain.com: > >> >> > > >> >> > Failed actions: > >> >> > pgsql:0_monitor_4000 (node=a.mydomain.com, call=89, rc=7, > >> >> > status=complete): not running > >> >> > > >> >> > This is in the log file on a node: > >> >> > Dec 10 20:49:57 a pgsql[903]: INFO: Don't check > >> >> > /var/lib/postgresql/9.3/main during probe > >> >> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM > operation > >> >> > pgsql-master-ip_monitor_0 (call=2, rc=7, cib-update=7, > confirmed=true) > >> >> not > >> >> > running > >> >> > Dec 10 20:49:57 a pgsql[903]: INFO: PostgreSQL is down > >> >> > Dec 10 20:49:57 a lrmd: [890]: info: operation monitor[3] on > pgsql:1 > >> for > >> >> > client 893: pid 903 exited with return code 7 > >> >> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM > operation > >> >> > pgsql:1_monitor_0 (call=3, rc=7, cib-update=8, confirmed=true) not > >> >> running > >> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: > Sending > >> >> flush > >> >> > op to all hosts for: probe_complete (true) > >> >> > Dec 10 20:49:57 a lrmd: [890]: info: rsc:pgsql:1 start[4] (pid 986) > >> >> > Dec 10 20:49:57 a pgsql[986]: INFO: Changing pgsql-status on > >> >> > a.mydomain.com: ->STOP. > >> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: > Sending > >> >> flush > >> >> > op to all hosts for: pgsql-status (STOP) > >> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: > Sending > >> >> flush > >> >> > op to all hosts for: master-pgsql:1 (-INFINITY) > >> >> > Dec 10 20:49:57 a pgsql[986]: INFO: Set all nodes into async mode. > >> >> > Dec 10 20:49:57 a pgsql[986]: INFO: server starting > >> >> > Dec 10 20:49:57 a pgsql[986]: INFO: PostgreSQL start command sent. > >> >> > Dec 10 20:49:58 a lrmd: [890]: info: RA output: > (pgsql:1:start:stderr) > >> >> > psql: FATAL: the database system is starting up > >> >> > Dec 10 20:49:58 a pgsql[986]: WARNING: Can't get PostgreSQL > recovery > >> >> > status. rc=2 > >> >> > Dec 10 20:49:58 a pgsql[986]: WARNING: Connection error > (connection to > >> >> the > >> >> > server went bad and the session was not interactive) occurred while > >> >> > executing the psql command. > >> >> > Dec 10 20:49:59 a pgsql[986]: INFO: PostgreSQL is started. > >> >> > Dec 10 20:49:59 a pgsql[986]: INFO: Changing pgsql-status on > >> >> > a.mydomain.com: ->HS:alone. > >> >> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: > Sending > >> >> flush > >> >> > op to all hosts for: pgsql-status (HS:alone) > >> >> > Dec 10 20:49:59 a lrmd: [890]: info: operation start[4] on pgsql:1 > for > >> >> > client 893: pid 986 exited with return code 0 > >> >> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM > operation > >> >> > pgsql:1_start_0 (call=4, rc=0, cib-update=9, confirmed=true) ok > >> >> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 notify[5] (pid > 1163) > >> >> > Dec 10 20:49:59 a lrmd: [890]: info: operation notify[5] on pgsql:1 > >> for > >> >> > client 893: pid 1163 exited with return code 0 > >> >> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM > operation > >> >> > pgsql:1_notify_0 (call=5, rc=0, cib-update=0, confirmed=true) ok > >> >> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 monitor[6] (pid > 1207) > >> >> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: > Sending > >> >> flush > >> >> > op to all hosts for: pgsql-status (HS:alone) > >> >> > > >> >> > I think it is wrong, becouse is 2 live nodes. One can stay as > master. > >> >> > > >> >> > Also this is in postgresql log on a node: > >> >> > 2013-12-06 10:56:53 MSK WARNING: archive_mode enabled, yet > >> >> archive_command > >> >> > is not set > >> >> > 2013-12-06 10:57:37 MSK LOG: received SIGHUP, reloading > configuration > >> >> files > >> >> > 2013-12-06 10:57:37 MSK LOG: parameter "archive_command" changed > to > >> "cp > >> >> %p > >> >> > /var/lib/postgresql/9.3/pg_archive/%f" > >> >> > 2013-12-06 10:57:43 MSK ERROR: a backup is not in progress > >> >> > 2013-12-06 10:57:43 MSK STATEMENT: SELECT pg_stop_backup() > >> >> > 2013-12-07 10:24:22 MSK LOG: received fast shutdown request > >> >> > 2013-12-07 10:24:22 MSK LOG: aborting any active transactions > >> >> > 2013-12-07 10:24:22 MSK LOG: autovacuum launcher shutting down > >> >> > 2013-12-07 10:24:22 MSK LOG: shutting down > >> >> > 2013-12-07 10:24:22 MSK LOG: database system is shut down > >> >> > 2013-12-07 10:24:29 MSK LOG: database system was shut down at > >> 2013-12-07 > >> >> > 10:24:22 MSK > >> >> > 2013-12-07 10:24:29 MSK LOG: autovacuum launcher started > >> >> > 2013-12-07 10:24:29 MSK LOG: database system is ready to accept > >> >> connections > >> >> > 2013-12-07 10:24:29 MSK LOG: incomplete startup packet > >> >> > 2013-12-07 10:24:34 MSK LOG: received fast shutdown request > >> >> > 2013-12-07 10:24:34 MSK LOG: aborting any active transactions > >> >> > 2013-12-07 10:24:34 MSK LOG: autovacuum launcher shutting down > >> >> > 2013-12-07 10:24:34 MSK LOG: shutting down > >> >> > 2013-12-07 10:24:34 MSK LOG: database system is shut down > >> >> > 2013-12-07 14:31:11 MSK LOG: database system was shut down in > >> recovery > >> >> at > >> >> > 2013-12-07 14:29:19 MSK > >> >> > cp: cannot stat > >> `/var/lib/postgresql/9.3/pg_archive/00000002.history': No > >> >> > such file or directory > >> >> > 2013-12-07 14:31:11 MSK LOG: entering standby mode > >> >> > cp: cannot stat > >> >> > `/var/lib/postgresql/9.3/pg_archive/000000010000000000000007': No > such > >> >> file > >> >> > or directory > >> >> > 2013-12-07 14:31:11 MSK LOG: consistent recovery state reached at > >> >> 0/7000090 > >> >> > 2013-12-07 14:31:11 MSK LOG: record with zero length at 0/7000090 > >> >> > 2013-12-07 14:31:11 MSK LOG: database system is ready to accept > read > >> >> only > >> >> > connections > >> >> > 2013-12-07 14:31:12 MSK LOG: incomplete startup packet > >> >> > 2013-12-07 14:31:14 MSK FATAL: could not connect to the primary > >> server: > >> >> > could not connect to server: No route to host > >> >> > Is the server running on host "192.168.10.200" and > >> >> accepting > >> >> > TCP/IP connections on port 5432? > >> >> > > >> >> > Why master not got shutdown request? It is life. > >> >> > > >> >> > > >> >> > 2. There is my config: > >> >> > node a.mydomain.com \ > >> >> > attributes pgsql-data-status="DISCONNECT" > >> >> > node b.mydomain.com \ > >> >> > attributes pgsql-data-status="LATEST" > pgsql-status="HS:async" > >> >> > node c.mydomain.com \ > >> >> > attributes pgsql-data-status="STREAMING|SYNC" > >> >> > pgsql-status="HS:async" > >> >> > primitive apache ocf:heartbeat:apache \ > >> >> > params configfile="/etc/apache2/apache2.conf" \ > >> >> > op monitor interval="1min" > >> >> > primitive apache-master-ip ocf:heartbeat:IPaddr2 \ > >> >> > params ip="192.168.10.100" nic="peervpn0" \ > >> >> > op monitor interval="30s" > >> >> > primitive pgsql ocf:heartbeat:pgsql \ > >> >> > params pgctl="/usr/lib/postgresql/9.3/bin/pg_ctl" > >> >> > psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.3/main" > >> start_opt="-p > >> >> 543 > >> >> > 2" rep_mode="sync" node_list="a.mydomain.com b.mydomain.com > >> >> c.mydomain.com" > >> >> > restore_command="cp /v > >> >> > ar/lib/postgresql/9.3/pg_archive/%f %p" master_ip="192.168.10.200" > >> >> > restart_on_promote="true" config="/etc/postgresql/9.3/main/postgres > >> >> > ql.conf" \ > >> >> > op start interval="0s" timeout="60s" on-fail="restart" \ > >> >> > op monitor interval="4s" timeout="60s" on-fail="restart" \ > >> >> > op monitor interval="3s" role="Master" timeout="60s" > >> >> > on-fail="restart" \ > >> >> > op promote interval="0s" timeout="60s" on-fail="restart" \ > >> >> > op demote interval="0s" timeout="60s" on-fail="stop" \ > >> >> > op stop interval="0s" timeout="60s" on-fail="block" \ > >> >> > op notify interval="0s" timeout="60s" > >> >> > primitive pgsql-master-ip ocf:heartbeat:IPaddr2 \ > >> >> > params ip="192.168.10.200" nic="peervpn0" \ > >> >> > op start interval="0s" timeout="60s" on-fail="restart" \ > >> >> > op monitor interval="10s" timeout="60s" on-fail="restart" \ > >> >> > op stop interval="0s" timeout="60s" on-fail="block" \ > >> >> > meta target-role="Started" > >> >> > group master pgsql-master-ip > >> >> > ms msPostgresql pgsql \ > >> >> > meta master-max="1" master-node-max="1" clone-max="3" > >> >> > clone-node-max="1" target-role="Master" notify="true" > >> >> > location prefer-apache-node apache 150: b.mydomain.com > >> >> > colocation apache-with-ip inf: apache apache-master-ip > >> >> > colocation set_ip inf: master msPostgresql:Master > >> >> > order apache-after-ip inf: apache-master-ip apache > >> >> > order ip_down 0: msPostgresql:demote master:stop symmetrical=false > >> >> > order ip_up 0: msPostgresql:promote master:start symmetrical=false > >> >> > property $id="cib-bootstrap-options" \ > >> >> > > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > >> >> > cluster-infrastructure="openais" \ > >> >> > expected-quorum-votes="3" \ > >> >> > stonith-enabled="false" \ > >> >> > crmd-transition-delay="0" \ > >> >> > last-lrm-refresh="1386751770" > >> >> > rsc_defaults $id="rsc-options" \ > >> >> > resource-stickiness="100" \ > >> >> > migration-threshold="1" > >> >> > > >> >> > Where I will add rep_mode="async"? In easch slave node attributes? > >> >> > > >> >> > > >> >> > > >> >> > 2013/12/13 Takehiro Matsushima <[email protected]> > >> >> > > >> >> >> Hello, > >> >> >> > >> >> >> 1. How is it work stably after that? Failover works correctly, > too? > >> >> >> > >> >> >> 2. I see, in this case, specify rep_mode="async" in crm config > then > >> >> >> all slaves run in async. > >> >> >> > >> >> >> -- > >> >> >> Regards, > >> >> >> Takehiro Matsushima > >> >> >> _______________________________________________ > >> >> >> Linux-HA mailing list > >> >> >> [email protected] > >> >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> >> >> See also: http://linux-ha.org/ReportingProblems > >> >> >> > >> >> > _______________________________________________ > >> >> > Linux-HA mailing list > >> >> > [email protected] > >> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> >> > See also: http://linux-ha.org/ReportingProblems > >> >> > >> >> > >> >> > >> >> -- > >> >> Regards, > >> >> Takehiro Matsushima > >> >> _______________________________________________ > >> >> Linux-HA mailing list > >> >> [email protected] > >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> >> See also: http://linux-ha.org/ReportingProblems > >> >> > >> > _______________________________________________ > >> > Linux-HA mailing list > >> > [email protected] > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> > See also: http://linux-ha.org/ReportingProblems > >> > >> > >> > >> -- > >> Regards, > >> Takehiro Matsushima > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > -- > Regards, > Takehiro Matsushima > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
