1. If fall down:
============
Last updated: Fri Dec 13 19:06:51 2013
Last change: Fri Dec 13 10:06:49 2013 via cibadmin on a.mydomain.com
Stack: openais
Current DC: c.mydomain.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
Full list of resources:
Resource Group: master
pgsql-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
Master/Slave Set: msPostgresql [pgsql]
Masters: [ b.mydomain.com ]
Slaves: [ c.mydomain.com ]
Stopped: [ pgsql:0 ]
apache-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
apache (ocf::heartbeat:apache): Started b.mydomain.com
Node Attributes:
* Node a.mydomain.com:
+ master-pgsql:0 : -INFINITY
+ master-pgsql:1 : 1000
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
* Node c.mydomain.com:
+ master-pgsql:2 : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
* Node b.mydomain.com:
+ master-pgsql:0 : -INFINITY
+ master-pgsql:1 : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 000000000F000090
+ pgsql-status : PRI
Migration summary:
* Node a.mydomain.com:
pgsql:0: migration-threshold=1 fail-count=1
* Node c.mydomain.com:
* Node b.mydomain.com:
Failed actions:
pgsql:0_monitor_4000 (node=a.mydomain.com, call=89, rc=7,
status=complete): not running
This is in the log file on a node:
Dec 10 20:49:57 a pgsql[903]: INFO: Don't check
/var/lib/postgresql/9.3/main during probe
Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
pgsql-master-ip_monitor_0 (call=2, rc=7, cib-update=7, confirmed=true) not
running
Dec 10 20:49:57 a pgsql[903]: INFO: PostgreSQL is down
Dec 10 20:49:57 a lrmd: [890]: info: operation monitor[3] on pgsql:1 for
client 893: pid 903 exited with return code 7
Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
pgsql:1_monitor_0 (call=3, rc=7, cib-update=8, confirmed=true) not running
Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending flush
op to all hosts for: probe_complete (true)
Dec 10 20:49:57 a lrmd: [890]: info: rsc:pgsql:1 start[4] (pid 986)
Dec 10 20:49:57 a pgsql[986]: INFO: Changing pgsql-status on
a.mydomain.com: ->STOP.
Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending flush
op to all hosts for: pgsql-status (STOP)
Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending flush
op to all hosts for: master-pgsql:1 (-INFINITY)
Dec 10 20:49:57 a pgsql[986]: INFO: Set all nodes into async mode.
Dec 10 20:49:57 a pgsql[986]: INFO: server starting
Dec 10 20:49:57 a pgsql[986]: INFO: PostgreSQL start command sent.
Dec 10 20:49:58 a lrmd: [890]: info: RA output: (pgsql:1:start:stderr)
psql: FATAL: the database system is starting up
Dec 10 20:49:58 a pgsql[986]: WARNING: Can't get PostgreSQL recovery
status. rc=2
Dec 10 20:49:58 a pgsql[986]: WARNING: Connection error (connection to the
server went bad and the session was not interactive) occurred while
executing the psql command.
Dec 10 20:49:59 a pgsql[986]: INFO: PostgreSQL is started.
Dec 10 20:49:59 a pgsql[986]: INFO: Changing pgsql-status on
a.mydomain.com: ->HS:alone.
Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending flush
op to all hosts for: pgsql-status (HS:alone)
Dec 10 20:49:59 a lrmd: [890]: info: operation start[4] on pgsql:1 for
client 893: pid 986 exited with return code 0
Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
pgsql:1_start_0 (call=4, rc=0, cib-update=9, confirmed=true) ok
Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 notify[5] (pid 1163)
Dec 10 20:49:59 a lrmd: [890]: info: operation notify[5] on pgsql:1 for
client 893: pid 1163 exited with return code 0
Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
pgsql:1_notify_0 (call=5, rc=0, cib-update=0, confirmed=true) ok
Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 monitor[6] (pid 1207)
Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending flush
op to all hosts for: pgsql-status (HS:alone)
I think it is wrong, becouse is 2 live nodes. One can stay as master.
Also this is in postgresql log on a node:
2013-12-06 10:56:53 MSK WARNING: archive_mode enabled, yet archive_command
is not set
2013-12-06 10:57:37 MSK LOG: received SIGHUP, reloading configuration files
2013-12-06 10:57:37 MSK LOG: parameter "archive_command" changed to "cp %p
/var/lib/postgresql/9.3/pg_archive/%f"
2013-12-06 10:57:43 MSK ERROR: a backup is not in progress
2013-12-06 10:57:43 MSK STATEMENT: SELECT pg_stop_backup()
2013-12-07 10:24:22 MSK LOG: received fast shutdown request
2013-12-07 10:24:22 MSK LOG: aborting any active transactions
2013-12-07 10:24:22 MSK LOG: autovacuum launcher shutting down
2013-12-07 10:24:22 MSK LOG: shutting down
2013-12-07 10:24:22 MSK LOG: database system is shut down
2013-12-07 10:24:29 MSK LOG: database system was shut down at 2013-12-07
10:24:22 MSK
2013-12-07 10:24:29 MSK LOG: autovacuum launcher started
2013-12-07 10:24:29 MSK LOG: database system is ready to accept connections
2013-12-07 10:24:29 MSK LOG: incomplete startup packet
2013-12-07 10:24:34 MSK LOG: received fast shutdown request
2013-12-07 10:24:34 MSK LOG: aborting any active transactions
2013-12-07 10:24:34 MSK LOG: autovacuum launcher shutting down
2013-12-07 10:24:34 MSK LOG: shutting down
2013-12-07 10:24:34 MSK LOG: database system is shut down
2013-12-07 14:31:11 MSK LOG: database system was shut down in recovery at
2013-12-07 14:29:19 MSK
cp: cannot stat `/var/lib/postgresql/9.3/pg_archive/00000002.history': No
such file or directory
2013-12-07 14:31:11 MSK LOG: entering standby mode
cp: cannot stat
`/var/lib/postgresql/9.3/pg_archive/000000010000000000000007': No such file
or directory
2013-12-07 14:31:11 MSK LOG: consistent recovery state reached at 0/7000090
2013-12-07 14:31:11 MSK LOG: record with zero length at 0/7000090
2013-12-07 14:31:11 MSK LOG: database system is ready to accept read only
connections
2013-12-07 14:31:12 MSK LOG: incomplete startup packet
2013-12-07 14:31:14 MSK FATAL: could not connect to the primary server:
could not connect to server: No route to host
Is the server running on host "192.168.10.200" and accepting
TCP/IP connections on port 5432?
Why master not got shutdown request? It is life.
2. There is my config:
node a.mydomain.com \
attributes pgsql-data-status="DISCONNECT"
node b.mydomain.com \
attributes pgsql-data-status="LATEST" pgsql-status="HS:async"
node c.mydomain.com \
attributes pgsql-data-status="STREAMING|SYNC"
pgsql-status="HS:async"
primitive apache ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" \
op monitor interval="1min"
primitive apache-master-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.10.100" nic="peervpn0" \
op monitor interval="30s"
primitive pgsql ocf:heartbeat:pgsql \
params pgctl="/usr/lib/postgresql/9.3/bin/pg_ctl"
psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.3/main" start_opt="-p 543
2" rep_mode="sync" node_list="a.mydomain.com b.mydomain.com c.mydomain.com"
restore_command="cp /v
ar/lib/postgresql/9.3/pg_archive/%f %p" master_ip="192.168.10.200"
restart_on_promote="true" config="/etc/postgresql/9.3/main/postgres
ql.conf" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="4s" timeout="60s" on-fail="restart" \
op monitor interval="3s" role="Master" timeout="60s"
on-fail="restart" \
op promote interval="0s" timeout="60s" on-fail="restart" \
op demote interval="0s" timeout="60s" on-fail="stop" \
op stop interval="0s" timeout="60s" on-fail="block" \
op notify interval="0s" timeout="60s"
primitive pgsql-master-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.10.200" nic="peervpn0" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block" \
meta target-role="Started"
group master pgsql-master-ip
ms msPostgresql pgsql \
meta master-max="1" master-node-max="1" clone-max="3"
clone-node-max="1" target-role="Master" notify="true"
location prefer-apache-node apache 150: b.mydomain.com
colocation apache-with-ip inf: apache apache-master-ip
colocation set_ip inf: master msPostgresql:Master
order apache-after-ip inf: apache-master-ip apache
order ip_down 0: msPostgresql:demote master:stop symmetrical=false
order ip_up 0: msPostgresql:promote master:start symmetrical=false
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="3" \
stonith-enabled="false" \
crmd-transition-delay="0" \
last-lrm-refresh="1386751770"
rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="1"
Where I will add rep_mode="async"? In easch slave node attributes?
2013/12/13 Takehiro Matsushima <[email protected]>
> Hello,
>
> 1. How is it work stably after that? Failover works correctly, too?
>
> 2. I see, in this case, specify rep_mode="async" in crm config then
> all slaves run in async.
>
> --
> Regards,
> Takehiro Matsushima
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems