Re: [Linux-HA] How to tell master-slave group set one node to master?

Takehiro Matsushima Fri, 13 Dec 2013 10:57:39 -0800

1. Temporarily, how about cleanup completely all nodes once? like
master is "a", slaves are "b" and "c".


2. It looks like it caused by RA... umm... I'll try building a cluster
on Debian 7.

2013/12/14 Andrey Rogovsky <[email protected]>:
> 1. How I can find status in the log? What exactly I need search in?
>
> 2. I did it and have this situation:
> On a node:
> root@a:~# sudo -u postgres psql
> could not change directory to "/root": Permission denied
> psql (9.3.2)
> Type "help" for help.
>
> postgres=# select client_addr,sync_state from pg_stat_replication;
>  client_addr  | sync_state
> --------------+------------
>  192.168.10.2 | async
>  192.168.10.3 | async
> (2 rows)
>
> So, pgsql is correct. But...
> root@a:~# crm_mon -VAf -1
> crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
> msPostgresql from re-starting on a.mydomain.com: operation monitor failed
> 'invalid parameter' (rc=2)
> crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
> msPostgresql from re-starting on b.mydomain.com: operation monitor failed
> 'invalid parameter' (rc=2)
> crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
> msPostgresql from re-starting on c.mydomain.com: operation monitor failed
> 'invalid parameter' (rc=2)
> ============
> Last updated: Fri Dec 13 22:15:30 2013
> Last change: Fri Dec 13 20:48:18 2013 via crmd on c.mydomain.com
> Stack: openais
> Current DC: a.mydomain.com - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 3 Nodes configured, 3 expected votes
> 6 Resources configured.
> ============
>
> Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
>
>  apache-master-ip (ocf::heartbeat:IPaddr2): Started a.mydomain.com
>  apache (ocf::heartbeat:apache): Started a.mydomain.com
>
> Node Attributes:
> * Node a.mydomain.com:
>     + pgsql-data-status               : LATEST
> * Node c.mydomain.com:
>     + pgsql-data-status               : STREAMING|ASYNC
>     + pgsql-status                     : HS:async
> * Node b.mydomain.com:
>     + pgsql-data-status               : STREAMING|ASYNC
>     + pgsql-status                     : HS:async
>
> Migration summary:
> * Node a.mydomain.com:
> * Node b.mydomain.com:
> * Node c.mydomain.com:
>
> Failed actions:
>     pgsql:0_monitor_0 (node=a.mydomain.com, call=31, rc=2,
> status=complete): invalid parameter
>     pgsql:0_monitor_0 (node=b.mydomain.com, call=26, rc=2,
> status=complete): invalid parameter
>     pgsql:0_monitor_0 (node=c.mydomain.com, call=22, rc=2,
> status=complete): invalid parameter
> root@a:~#
>
> How I can fix it?
>
>
>
> 2013/12/13 Takehiro Matsushima <[email protected]>
>
>> 1. Excuse me, could you tell me status before a.mydomain.com fails?
>>
>> 2. Sorry, replace rep_mode="sync" with rep_mode="async" defined in
>> primitive pgsql.
>>
>> 2013/12/14 Andrey Rogovsky <[email protected]>:
>> > 1. If fall down:
>> > ============
>> > Last updated: Fri Dec 13 19:06:51 2013
>> > Last change: Fri Dec 13 10:06:49 2013 via cibadmin on a.mydomain.com
>> > Stack: openais
>> > Current DC: c.mydomain.com - partition with quorum
>> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> > 3 Nodes configured, 3 expected votes
>> > 6 Resources configured.
>> > ============
>> >
>> > Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
>> >
>> > Full list of resources:
>> >
>> >  Resource Group: master
>> >      pgsql-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
>> >  Master/Slave Set: msPostgresql [pgsql]
>> >      Masters: [ b.mydomain.com ]
>> >      Slaves: [ c.mydomain.com ]
>> >      Stopped: [ pgsql:0 ]
>> >  apache-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
>> >  apache (ocf::heartbeat:apache): Started b.mydomain.com
>> >
>> > Node Attributes:
>> > * Node a.mydomain.com:
>> >     + master-pgsql:0                   : -INFINITY
>> >     + master-pgsql:1                   : 1000
>> >     + pgsql-data-status               : DISCONNECT
>> >     + pgsql-status                     : STOP
>> > * Node c.mydomain.com:
>> >     + master-pgsql:2                   : 100
>> >     + pgsql-data-status               : STREAMING|SYNC
>> >     + pgsql-status                     : HS:sync
>> > * Node b.mydomain.com:
>> >     + master-pgsql:0                   : -INFINITY
>> >     + master-pgsql:1                   : 1000
>> >     + pgsql-data-status               : LATEST
>> >     + pgsql-master-baseline           : 000000000F000090
>> >     + pgsql-status                     : PRI
>> >
>> > Migration summary:
>> > * Node a.mydomain.com:
>> >    pgsql:0: migration-threshold=1 fail-count=1
>> > * Node c.mydomain.com:
>> > * Node b.mydomain.com:
>> >
>> > Failed actions:
>> >     pgsql:0_monitor_4000 (node=a.mydomain.com, call=89, rc=7,
>> > status=complete): not running
>> >
>> > This is in the log file on a node:
>> > Dec 10 20:49:57 a pgsql[903]: INFO: Don't check
>> > /var/lib/postgresql/9.3/main during probe
>> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
>> > pgsql-master-ip_monitor_0 (call=2, rc=7, cib-update=7, confirmed=true)
>> not
>> > running
>> > Dec 10 20:49:57 a pgsql[903]: INFO: PostgreSQL is down
>> > Dec 10 20:49:57 a lrmd: [890]: info: operation monitor[3] on pgsql:1 for
>> > client 893: pid 903 exited with return code 7
>> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
>> > pgsql:1_monitor_0 (call=3, rc=7, cib-update=8, confirmed=true) not
>> running
>> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> flush
>> > op to all hosts for: probe_complete (true)
>> > Dec 10 20:49:57 a lrmd: [890]: info: rsc:pgsql:1 start[4] (pid 986)
>> > Dec 10 20:49:57 a pgsql[986]: INFO: Changing pgsql-status on
>> > a.mydomain.com: ->STOP.
>> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> flush
>> > op to all hosts for: pgsql-status (STOP)
>> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> flush
>> > op to all hosts for: master-pgsql:1 (-INFINITY)
>> > Dec 10 20:49:57 a pgsql[986]: INFO: Set all nodes into async mode.
>> > Dec 10 20:49:57 a pgsql[986]: INFO: server starting
>> > Dec 10 20:49:57 a pgsql[986]: INFO: PostgreSQL start command sent.
>> > Dec 10 20:49:58 a lrmd: [890]: info: RA output: (pgsql:1:start:stderr)
>> > psql: FATAL:  the database system is starting up
>> > Dec 10 20:49:58 a pgsql[986]: WARNING: Can't get PostgreSQL recovery
>> > status. rc=2
>> > Dec 10 20:49:58 a pgsql[986]: WARNING: Connection error (connection to
>> the
>> > server went bad and the session was not interactive) occurred while
>> > executing the psql command.
>> > Dec 10 20:49:59 a pgsql[986]: INFO: PostgreSQL is started.
>> > Dec 10 20:49:59 a pgsql[986]: INFO: Changing pgsql-status on
>> > a.mydomain.com: ->HS:alone.
>> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending
>> flush
>> > op to all hosts for: pgsql-status (HS:alone)
>> > Dec 10 20:49:59 a lrmd: [890]: info: operation start[4] on pgsql:1 for
>> > client 893: pid 986 exited with return code 0
>> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
>> > pgsql:1_start_0 (call=4, rc=0, cib-update=9, confirmed=true) ok
>> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 notify[5] (pid 1163)
>> > Dec 10 20:49:59 a lrmd: [890]: info: operation notify[5] on pgsql:1 for
>> > client 893: pid 1163 exited with return code 0
>> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
>> > pgsql:1_notify_0 (call=5, rc=0, cib-update=0, confirmed=true) ok
>> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 monitor[6] (pid 1207)
>> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending
>> flush
>> > op to all hosts for: pgsql-status (HS:alone)
>> >
>> > I think it is wrong, becouse is 2 live nodes. One can stay as master.
>> >
>> > Also this is in postgresql log on a node:
>> > 2013-12-06 10:56:53 MSK WARNING:  archive_mode enabled, yet
>> archive_command
>> > is not set
>> > 2013-12-06 10:57:37 MSK LOG:  received SIGHUP, reloading configuration
>> files
>> > 2013-12-06 10:57:37 MSK LOG:  parameter "archive_command" changed to "cp
>> %p
>> > /var/lib/postgresql/9.3/pg_archive/%f"
>> > 2013-12-06 10:57:43 MSK ERROR:  a backup is not in progress
>> > 2013-12-06 10:57:43 MSK STATEMENT:  SELECT pg_stop_backup()
>> > 2013-12-07 10:24:22 MSK LOG:  received fast shutdown request
>> > 2013-12-07 10:24:22 MSK LOG:  aborting any active transactions
>> > 2013-12-07 10:24:22 MSK LOG:  autovacuum launcher shutting down
>> > 2013-12-07 10:24:22 MSK LOG:  shutting down
>> > 2013-12-07 10:24:22 MSK LOG:  database system is shut down
>> > 2013-12-07 10:24:29 MSK LOG:  database system was shut down at 2013-12-07
>> > 10:24:22 MSK
>> > 2013-12-07 10:24:29 MSK LOG:  autovacuum launcher started
>> > 2013-12-07 10:24:29 MSK LOG:  database system is ready to accept
>> connections
>> > 2013-12-07 10:24:29 MSK LOG:  incomplete startup packet
>> > 2013-12-07 10:24:34 MSK LOG:  received fast shutdown request
>> > 2013-12-07 10:24:34 MSK LOG:  aborting any active transactions
>> > 2013-12-07 10:24:34 MSK LOG:  autovacuum launcher shutting down
>> > 2013-12-07 10:24:34 MSK LOG:  shutting down
>> > 2013-12-07 10:24:34 MSK LOG:  database system is shut down
>> > 2013-12-07 14:31:11 MSK LOG:  database system was shut down in recovery
>> at
>> > 2013-12-07 14:29:19 MSK
>> > cp: cannot stat `/var/lib/postgresql/9.3/pg_archive/00000002.history': No
>> > such file or directory
>> > 2013-12-07 14:31:11 MSK LOG:  entering standby mode
>> > cp: cannot stat
>> > `/var/lib/postgresql/9.3/pg_archive/000000010000000000000007': No such
>> file
>> > or directory
>> > 2013-12-07 14:31:11 MSK LOG:  consistent recovery state reached at
>> 0/7000090
>> > 2013-12-07 14:31:11 MSK LOG:  record with zero length at 0/7000090
>> > 2013-12-07 14:31:11 MSK LOG:  database system is ready to accept read
>> only
>> > connections
>> > 2013-12-07 14:31:12 MSK LOG:  incomplete startup packet
>> > 2013-12-07 14:31:14 MSK FATAL:  could not connect to the primary server:
>> > could not connect to server: No route to host
>> >                 Is the server running on host "192.168.10.200" and
>> accepting
>> >                 TCP/IP connections on port 5432?
>> >
>> > Why master not got shutdown request? It is life.
>> >
>> >
>> > 2. There is my config:
>> > node a.mydomain.com \
>> >         attributes pgsql-data-status="DISCONNECT"
>> > node b.mydomain.com \
>> >         attributes pgsql-data-status="LATEST" pgsql-status="HS:async"
>> > node c.mydomain.com \
>> >         attributes pgsql-data-status="STREAMING|SYNC"
>> > pgsql-status="HS:async"
>> > primitive apache ocf:heartbeat:apache \
>> >         params configfile="/etc/apache2/apache2.conf" \
>> >         op monitor interval="1min"
>> > primitive apache-master-ip ocf:heartbeat:IPaddr2 \
>> >         params ip="192.168.10.100" nic="peervpn0" \
>> >         op monitor interval="30s"
>> > primitive pgsql ocf:heartbeat:pgsql \
>> >         params pgctl="/usr/lib/postgresql/9.3/bin/pg_ctl"
>> > psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.3/main" start_opt="-p
>> 543
>> > 2" rep_mode="sync" node_list="a.mydomain.com b.mydomain.com
>> c.mydomain.com"
>> > restore_command="cp /v
>> > ar/lib/postgresql/9.3/pg_archive/%f %p" master_ip="192.168.10.200"
>> > restart_on_promote="true" config="/etc/postgresql/9.3/main/postgres
>> > ql.conf" \
>> >         op start interval="0s" timeout="60s" on-fail="restart" \
>> >         op monitor interval="4s" timeout="60s" on-fail="restart" \
>> >         op monitor interval="3s" role="Master" timeout="60s"
>> > on-fail="restart" \
>> >         op promote interval="0s" timeout="60s" on-fail="restart" \
>> >         op demote interval="0s" timeout="60s" on-fail="stop" \
>> >         op stop interval="0s" timeout="60s" on-fail="block" \
>> >         op notify interval="0s" timeout="60s"
>> > primitive pgsql-master-ip ocf:heartbeat:IPaddr2 \
>> >         params ip="192.168.10.200" nic="peervpn0" \
>> >         op start interval="0s" timeout="60s" on-fail="restart" \
>> >         op monitor interval="10s" timeout="60s" on-fail="restart" \
>> >         op stop interval="0s" timeout="60s" on-fail="block" \
>> >         meta target-role="Started"
>> > group master pgsql-master-ip
>> > ms msPostgresql pgsql \
>> >         meta master-max="1" master-node-max="1" clone-max="3"
>> > clone-node-max="1" target-role="Master" notify="true"
>> > location prefer-apache-node apache 150: b.mydomain.com
>> > colocation apache-with-ip inf: apache apache-master-ip
>> > colocation set_ip inf: master msPostgresql:Master
>> > order apache-after-ip inf: apache-master-ip apache
>> > order ip_down 0: msPostgresql:demote master:stop symmetrical=false
>> > order ip_up 0: msPostgresql:promote master:start symmetrical=false
>> > property $id="cib-bootstrap-options" \
>> >         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>> >         cluster-infrastructure="openais" \
>> >         expected-quorum-votes="3" \
>> >         stonith-enabled="false" \
>> >         crmd-transition-delay="0" \
>> >         last-lrm-refresh="1386751770"
>> > rsc_defaults $id="rsc-options" \
>> >         resource-stickiness="100" \
>> >         migration-threshold="1"
>> >
>> > Where I will add rep_mode="async"? In easch slave node attributes?
>> >
>> >
>> >
>> > 2013/12/13 Takehiro Matsushima <[email protected]>
>> >
>> >> Hello,
>> >>
>> >> 1. How is it work stably after that? Failover works correctly, too?
>> >>
>> >> 2. I see, in this case, specify rep_mode="async" in crm config then
>> >> all slaves run in async.
>> >>
>> >> --
>> >> Regards,
>> >> Takehiro Matsushima
>> >> _______________________________________________
>> >> Linux-HA mailing list
>> >> [email protected]
>> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> See also: http://linux-ha.org/ReportingProblems
>> >>
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>>
>>
>>
>> --
>> Regards,
>> Takehiro Matsushima
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



-- 
Regards,
Takehiro Matsushima
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to tell master-slave group set one node to master?

Reply via email to