I wrote:
> The reporting critters are all on the slow side, so I suspected
> a timing problem, especially since it only started to show up
> after changing pg_ctl's timing behavior. I can't reproduce it
> locally on unmodified sources, but I could after putting my thumb
> on the scales like this:
>
> diff --git a/src/backend/replication/walreceiver.c
> b/src/backend/replication/walreceiver.c
> index ea9d21a..7ef22c1 100644
> *** a/src/backend/replication/walreceiver.c
> --- b/src/backend/replication/walreceiver.c
> *************** WalReceiverMain(void)
> *** 446,451 ****
> --- 446,454 ----
> * Something was
> received from master, so reset
> * timeout
> */
> +
> + pg_usleep(100000); /*
> be slow about processing */
> +
> last_recv_timestamp =
> GetCurrentTimestamp();
> ping_sent = false;
>
> XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
BTW, I neglected to mention that this hack also produces a second failure
in the recovery test suite:
[20:28:31] t/001_stream_rep.pl .................. 11/28
# Failed test 'xmin of cascaded slot null with no hs_feedback'
# at t/001_stream_rep.pl line 178.
# got: undef
# expected: ''
# Failed test 'catalog xmin of cascaded slot null with no hs_feedback'
# at t/001_stream_rep.pl line 179.
# got: undef
# expected: ''
[20:28:31] t/001_stream_rep.pl .................. 28/28 # Looks like you failed
2 tests of 28.
[20:28:31] t/001_stream_rep.pl .................. Dubious, test returned 2
(wstat 512, 0x200)
Failed 2/28 subtests
I had supposed last night that this was merely a matter of needing
to insert a wait_slot_xmins() call before those tests, but doing so
doesn't help. And, looking closer, the report is complaining that
$xmin and $catalog_xmin are coming back as undef, not empty strings.
My Perl-fu is too weak to figure out how that could possibly be ---
the documentation for PostgresNode::query_hash certainly states
flatly that it will never happen. And it even more certainly
shouldn't happen if we already successfully probed the
pg_replication_slots row in wait_slot_xmins().
The corresponding entries in the postmaster's log don't look abnormal:
2017-07-02 20:07:29.313 UTC [30862] LOG: database system is ready to accept
read only connections
2017-07-02 20:07:29.318 UTC [30867] LOG: started streaming WAL from primary at
0/3000000 on timeline 1
2017-07-02 20:07:29.415 UTC [30870] t/001_stream_rep.pl LOG: statement: SELECT
pg_create_physical_replication_slot('standby_2');
2017-07-02 20:07:29.619 UTC [30863] LOG: redo starts at 0/3029458
2017-07-02 20:07:29.665 UTC [30882] t/001_stream_rep.pl LOG: statement:
SELECT xmin IS NULL AND catalog_xmin IS NULL
FROM pg_catalog.pg_replication_slots
WHERE slot_name = 'standby_2';
2017-07-02 20:07:29.679 UTC [30884] t/001_stream_rep.pl LOG: statement: SELECT
plugin, slot_type, datoid, database, active, active_pid, xmin, catalog_xmin,
restart_lsn FROM pg_catalog.pg_replication_slots WHERE slot_name = 'standby_2'
(This is from a run with a wait_slot_xmins() call added, else the
statement logged by PID 30882 would not have been issued.)
Any ideas what's wrong there?
regards, tom lane
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers