Re: [GENERAL] Interesting streaming replication issue

Gunnar "Nick" Bluth Wed, 26 Jul 2017 23:43:17 -0700

(sorry for the top post, bitchy K9 Mail) 

James,


are you sure you're scp'ing from the archive, not from pg_xlog? 

Regards, 

Gunnar "Nick" Bluth

Am 27. Juli 2017 05:00:17 MESZ schrieb James Sewell <james.sew...@jirotech.com>:
>Hi all,
>
>I've got two servers (A,B) which are part of a streaming replication
>pair.
>A is the master, B is a hot standby. I'm sending archived WAL to a
>directory on A, B is reading it via SCP.
>
>This all works fine normally. I'm on Redhat 7.3, running EDB 9.6.2 (I'm
>currently working to reproduce with standard 9.6)
>
>We have recently seen a situation where B does not catch up when taken
>offline for maintenance.
>
>When B is started we see the following in the logs:
>
>2017-07-27 11:55:57 AEST [21432]: [979-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AA" from
>archive
>2017-07-27 11:55:58 AEST [21432]: [980-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AB" from
>archive
>2017-07-27 11:55:58 AEST [21432]: [981-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AC" from
>archive
>2017-07-27 11:55:59 AEST [21432]: [982-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AD" from
>archive
>2017-07-27 11:55:59 AEST [21432]: [983-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AE" from
>archive
>2017-07-27 11:56:00 AEST [21432]: [984-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000AF" from
>archive
>2017-07-27 11:56:00 AEST [21432]: [985-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B0" from
>archive
>2017-07-27 11:56:01 AEST [21432]: [986-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B1" from
>archive
>2017-07-27 11:56:01 AEST [21432]: [987-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B2" from
>archive
>2017-07-27 11:56:02 AEST [21432]: [988-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B3" from
>archive
>2017-07-27 11:56:02 AEST [21432]: [989-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B4" from
>archive
>2017-07-27 11:56:03 AEST [21432]: [990-1] user=,db=,client=
>(0:00000)LOG:  restored log file "0000000C0000005A000000B5" from
>archive
>scp: /archive/xlog//0000000C0000005A000000B6: No such file or directory
>2017-07-27 11:56:03 AEST [46191]: [1-1] user=,db=,client=
>(0:00000)LOG:  started streaming WAL from primary at 5A/B5000000 on
>timeline 12
>2017-07-27 11:56:03 AEST [46191]: [2-1] user=,db=,client=
>(0:XX000)FATAL:  could not receive data from WAL stream: ERROR:
>requested WAL segment 0000000C0000005A000000B5 has already been
>removed
>
>scp: /archive/xlog//0000000D.history: No such file or directory
>scp: /archive/xlog//0000000C0000005A000000B6: No such file or directory
>2017-07-27 11:56:04 AEST [46203]: [1-1] user=,db=,client=
>(0:00000)LOG:  started streaming WAL from primary at 5A/B5000000 on
>timeline 12
>2017-07-27 11:56:04 AEST [46203]: [2-1] user=,db=,client=
>(0:XX000)FATAL:  could not receive data from WAL stream: ERROR:
>requested WAL segment 0000000C0000005A000000B5 has already been
>removed
>
>This will loop indefinitely. At this stage the master reports no
>connected
>standbys in pg_stat_replication, and the standby has no running WAL
>receiver process.
>
>This can be 'fixed' by running pg_switch_xlog() on the master, at which
>time a connection is seen from the standby and the logs show the
>following:
>
>scp: /archive/xlog//0000000D.history: No such file or directory
>2017-07-27 12:03:19 AEST [21432]: [1029-1] user=,db=,client= 
>(0:00000)LOG:
> restored log file "0000000C0000005A000000B5" from archive
>scp: /archive/xlog//0000000C0000005A000000B6: No such file or directory
>2017-07-27 12:03:19 AEST [63141]: [1-1] user=,db=,client= 
>(0:00000)LOG:
> started streaming WAL from primary at 5A/B5000000 on timeline 12
>2017-07-27 12:03:19 AEST [63141]: [2-1] user=,db=,client= 
>(0:XX000)FATAL:
> could not receive data from WAL stream: ERROR:  requested WAL segment
>0000000C0000005A000000B5 has already been removed
>
>scp: /archive/xlog//0000000D.history: No such file or directory
>2017-07-27 12:03:24 AEST [21432]: [1030-1] user=,db=,client= 
>(0:00000)LOG:
> restored log file "0000000C0000005A000000B5" from archive
>2017-07-27 12:03:24 AEST [21432]: [1031-1] user=,db=,client= 
>(0:00000)LOG:
> restored log file "0000000C0000005A000000B6" from archive
>scp: /archive/xlog//0000000C0000005A000000B7: No such file or directory
>2017-07-27 12:03:25 AEST [21432]: [1032-1] user=,db=,client= 
>(0:00000)LOG:
>unexpected pageaddr 56/E7000000 in log segment
>0000000C0000005A000000B7,
>offset 0
>2017-07-27 12:03:25 AEST [63196]: [1-1] user=,db=,client= 
>(0:00000)LOG:
> started streaming WAL from primary
>
>At this time replication is running as normal and all changes are
>streamed.
>
>Am I missing something here, this seems very odd. One thing I have
>noticed
>is it only seems to be caused after a lot of WAL is produced and the
>pg_xlog
>directory is sitting at max_wal_size
>
>James Sewell,
>PostgreSQL Team Lead / Solutions Architect
>
>
>
>Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
>*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com 
>*F *
>(+61) 2 8099 9099 <(+61)%202%208099%209000>
>
>-- 
>
>------------------------------
>The contents of this email are confidential and may be subject to legal
>or 
>professional privilege and copyright. No representation is made that
>this 
>email is free of viruses or other defects. If you have received this 
>communication in error, you may not copy or distribute any part of it
>or 
>otherwise disclose its contents to anyone. Please advise the sender of
>your 
>incorrect receipt of this correspondence.

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Re: [GENERAL] Interesting streaming replication issue

Reply via email to