Hi Anzai-san 2014-05-16 14:01 GMT+09:00 Naoya Anzai <[email protected]>: > Hi All, > > I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20. > > I'm testing various failure patterns in a pgsql replicated cluster using it. > > I think if MASTER PostgreSQL process has suspended for a long time, > then the resource monitoring and demotion timed out, and the cluster cannot > failover until resume. > > -----the Cluster status after master demotion timed out.----- > Online: [ server1 server2 ] > > Master/Slave Set: msPostgresql [pgsql] > pgsql (ocf::heartbeat:pgsql): FAILED server2 > Stopped: [ server1 ] > Clone Set: ping-gw-rsc-clone [ping-gw-rsc] > Started: [ server1 server2 ] > > Node Attributes: > * Node server1: > + master-pgsql : -INFINITY > + pgsql-data-status : STREAMING|SYNC > + pgsql-status : STOP > + ping-gw1 : 100 > * Node server2: > + master-pgsql : -INFINITY > + pgsql-data-status : LATEST > + pgsql-status : PRI > + ping-gw1 : 100 > > Migration summary: > * Node server1: > * Node server2: > pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 > 14:07:43 2014' > > Failed actions: > pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, > last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms > ------------------------------------------------------- > > I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the > shutdown(-m i) command has timed out. > > What do you think abount my opinion ?
I think it makes sense. But I would like you to keep current stopping process because I think it's safer to use STONITH. Could you implement it adding new parameter if you implement? BTW, is it true that the cause of time-out is not "while" but "pg_ctl(-m i)"? If "pg_ctl (-m i)", you need to use time-out parameter or you can use exec_with_timeout(). Thanks, Takatoshi MATSUO _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
