After night idle both slaves stay disconnected.
I try recovery they and got this stauts:
Current DC: c.mydomain.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
4 Resources configured.
============
Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
Resource Group: master
pgsql-master-ip (ocf::heartbeat:IPaddr2): Started a.mydomain.com
Master/Slave Set: msPostgresql [pgsql]
Masters: [ a.mydomain.com ]
Slaves: [ b.mydomain.com c.mydomain.com ]
Node Attributes:
* Node a.mydomain.com:
+ master-pgsql:0 : 1000
+ master-pgsql:1 : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000008000090
+ pgsql-status : PRI
* Node c.mydomain.com:
+ master-pgsql:2 : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
* Node b.mydomain.com:
+ master-pgsql:0 : -INFINITY
+ master-pgsql:1 : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
+ pgsql-xlog-loc : 000000000E0000C8
Migration summary:
* Node a.mydomain.com:
* Node c.mydomain.com:
* Node b.mydomain.com:
My questions:
1. Why slaves stay disconnected? How I can found reason?
2. Why one node stay as sync replication, but other is async?
2013/12/10 Andrey Rogovsky <[email protected]>
> I was install RA of this branch
> Now one node from ms can stay master. But!
> postgres@a:~$ psql -c "select client_addr,sync_state from
> pg_stat_replication;"
> client_addr | sync_state
> --------------+------------
> 192.168.10.3 | sync
> 192.168.10.2 | async
>
> Why one slave in sync but other in async?
>
> Here is logs. This is from async node:
> Dec 10 11:53:47 2 lrmd: [26383]: info: operation notify[7] on pgsql:0 for
> client 26386: pid 29500 exited with return code 0
> Dec 10 11:53:47 2 crmd: [26386]: info: process_lrm_event: LRM operation
> pgsql:0_notify_0 (call=7, rc=0, cib-update=0, confirmed=true) ok
> Dec 10 11:53:49 2 pgsql[29550]: INFO: Master does not exist.
> Dec 10 11:53:49 2 pgsql[29550]: WARNING: My data is out-of-date.
> status=DISCONNECT
> Dec 10 11:53:50 2 lrmd: [26383]: info: rsc:pgsql:0 notify[8] (pid 29629)
> Dec 10 11:53:50 2 lrmd: [26383]: info: operation notify[8] on pgsql:0 for
> client 26386: pid 29629 exited with return code 0
> Dec 10 11:53:50 2 crmd: [26386]: info: process_lrm_event: LRM operation
> pgsql:0_notify_0 (call=8, rc=0, cib-update=0, confirmed=true) ok
> Dec 10 11:53:56 2 attrd: [26384]: notice: attrd_ais_dispatch: Update
> relayed from a.mydomain.com
> Dec 10 11:53:56 2 attrd: [26384]: notice: attrd_trigger_update: Sending
> flush op to all hosts for: pgsql-status (HS:async)
> Dec 10 11:53:56 2 attrd: [26384]: notice: attrd_perform_update: Sent
> update 20: pgsql-status=HS:async
>
> There is from a sync
> Dec 10 11:53:50 c crmd: [23076]: info: process_lrm_event: LRM operation
> pgsql:2_notify_0 (call=12, rc=0, cib-update=0, confirmed=true) ok
> Dec 10 11:53:54 c attrd: [23074]: notice: attrd_ais_dispatch: Update
> relayed from a.mydomain.com
> Dec 10 11:53:54 c attrd: [23074]: notice: attrd_trigger_update: Sending
> flush op to all hosts for: pgsql-status (HS:async)
> Dec 10 11:53:54 c attrd: [23074]: notice: attrd_perform_update: Sent
> update 59: pgsql-status=HS:async
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_ais_dispatch: Update
> relayed from a.mydomain.com
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-pgsql:2 (100)
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_perform_update: Sent
> update 63: master-pgsql:2=100
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_ais_dispatch: Update
> relayed from a.mydomain.com
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_trigger_update: Sending
> flush op to all hosts for: pgsql-status (HS:sync)
> Dec 10 11:53:56 c attrd: [23074]: notice: attrd_perform_update: Sent
> update 65: pgsql-status=HS:sync
>
>
>
> 2013/12/8 Takatoshi MATSUO <[email protected]>
>
>> 2013/12/8 Andrey Rogovsky <[email protected]>:
>> > 1. Yes
>> > 2. No
>> > 3. I have 3 nodes
>> > 4. Have this errors:
>> > Dec 7 17:35:28 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > /usr/lib/ocf/resource.d//heartbeat/pgsql: 1749: /usr/lib/ocf
>> > /resource.d//heartbeat/pgsql:
>> > Dec 7 17:35:28 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > ocf_local_nodename: not found
>>
>> Your resource-agents package doesn't have ocf_local_nodename function.
>> This functions is implemented with this patch.
>>
>> https://github.com/ClusterLabs/resource-agents/commit/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903
>>
>> How about using this commit?
>>
>> https://github.com/ClusterLabs/resource-agents/blob/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
>>
>>
>> I fixed wiki for Fedora19.
>>
>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#Replacement_of_pgsql_RA_.28both_nodes.29
>>
>> But I'm afraid I don't know it works under Debian 7.
>>
>> > Dec 7 17:35:28 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Dec 7 17:35:28 a pgsql[25791]: INFO: Master does not exist.
>> > Dec 7 17:35:28 a pgsql[25791]: INFO: My data status=.
>> > Dec 7 17:35:28 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Could not map uname=-n to a UUID: The object/attribute does
>> > not exist
>> > Dec 7 17:35:28 a pgsql[25791]: WARNING: Can't get a.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:28 a pgsql[25791]: WARNING: Can't get b.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:28 a pgsql[25791]: WARNING: Can't get c.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:32 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > /usr/lib/ocf/resource.d//heartbeat/pgsql: 1749: /usr/lib/ocf
>> > /resource.d//heartbeat/pgsql:
>> > Dec 7 17:35:32 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > ocf_local_nodename: not found
>> > Dec 7 17:35:32 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Dec 7 17:35:33 a pgsql[25934]: INFO: Master does not exist.
>> > Dec 7 17:35:33 a pgsql[25934]: INFO: My data status=.
>> > Dec 7 17:35:33 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Could not map uname=-n to a UUID: The object/attribute does
>> > not exist
>> > Dec 7 17:35:33 a pgsql[25934]: WARNING: Can't get a.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:33 a pgsql[25934]: WARNING: Can't get b.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:33 a pgsql[25934]: WARNING: Can't get c.mydomain.com xlog
>> > location.
>> > Dec 7 17:35:37 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > /usr/lib/ocf/resource.d//heartbeat/pgsql: 1749: /usr/lib/ocf
>> > /resource.d//heartbeat/pgsql:
>> > Dec 7 17:35:37 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > ocf_local_nodename: not found
>> > Dec 7 17:35:37 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Dec 7 17:35:37 a pgsql[26080]: INFO: Master does not exist.
>> > Dec 7 17:35:37 a pgsql[26080]: INFO: My data status=.
>> > Dec 7 17:35:37 a lrmd: [2452]: info: RA output:
>> (pgsql:0:monitor:stderr)
>> > Could not map uname=-n to a UUID: The object/attribute does
>> > not exist
>> >
>> >
>> >
>> >
>> > 2013/12/8 Takehiro Matsushima <[email protected]>
>> >
>> >> Hi.
>> >>
>> >> May I confirm if you've been tried?
>> >>
>> >> 1. Is Streaming Replication OK without Pacemaker?
>> >> (Master/SyncSlave/AsyncSlave)
>> >>
>> >> 2. Can a node promote to Master state without another nodes?
>> >> 3. and how is the two nodes configuration?
>> >>
>> >> 4. corosync's log and postgresql's log have no hints?
>> >>
>> >> I'm sorry if you done.
>> >>
>> >> Regards,
>> >> Takehiro Matsushima
>> >> _______________________________________________
>> >> Linux-HA mailing list
>> >> [email protected]
>> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> See also: http://linux-ha.org/ReportingProblems
>> >>
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems