Hi Anzai-san

2014-05-16 14:01 GMT+09:00 Naoya Anzai <[email protected]>:
> Hi All,
>
> I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20.
>
> I'm testing various failure patterns in a pgsql replicated cluster using it.
>
> I think if MASTER PostgreSQL process has suspended for a long time,
> then the resource monitoring and demotion timed out, and the cluster cannot 
> failover until resume.
>
> -----the Cluster status after master demotion timed out.-----
> Online: [ server1 server2 ]
>
>  Master/Slave Set: msPostgresql [pgsql]
>      pgsql      (ocf::heartbeat:pgsql): FAILED server2
>      Stopped: [ server1 ]
>  Clone Set: ping-gw-rsc-clone [ping-gw-rsc]
>      Started: [ server1 server2 ]
>
> Node Attributes:
> * Node server1:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : STREAMING|SYNC
>     + pgsql-status                      : STOP
>     + ping-gw1                          : 100
> * Node server2:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : LATEST
>     + pgsql-status                      : PRI
>     + ping-gw1                          : 100
>
> Migration summary:
> * Node server1:
> * Node server2:
>    pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 
> 14:07:43 2014'
>
> Failed actions:
>     pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, 
> last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms
> -------------------------------------------------------
>
> I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the 
> shutdown(-m i) command has timed out.
>
> What do you think abount my opinion ?

I think it makes sense.
But I would like you to keep current stopping process
because I think it's safer to use STONITH.
Could you implement it adding new parameter if you implement?


BTW, is it true that the cause of time-out is not "while" but "pg_ctl(-m i)"?
If "pg_ctl (-m i)", you need to use time-out parameter or you can use
exec_with_timeout().

Thanks,
Takatoshi MATSUO
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to