Re: [HACKERS] Race condition between hot standby and restoring a FPW

Heikki Linnakangas Wed, 12 Nov 2014 07:15:55 -0800

On 11/12/2014 04:56 PM, Tom Lane wrote:

Heikki Linnakangas <[email protected]> writes:

There's a race condition between a backend running queries in hot
standby mode, and restoring a full-page image from a WAL record. It's
present in all supported versions.

I can think of two ways to fix this:

1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before
returning it. That makes the API inconsistent, as the function would
sometimes lock the page, and sometimes not.


Ugh.

2. When ReadBufferExtended doesn't find the page in cache, it returns
the buffer in !BM_VALID state (i.e. still in I/O in-progress state).
Require the caller to call a second function, after locking the page, to
finish the I/O.


Not great either.  What about an RBM_NOERROR mode that is like RBM_ZERO
in terms of handling error conditions, but does not forcibly zero the page
if it's already valid?


Isn't that exactly what RBM_ZERO_ONERROR does?

Anyway, you don't want to read the page from disk, just to check if it'salready valid. We stopped doing that in 8.2 (commit8c3cc86e7b688b0efe5ec6ce4f4342c2883b1db5), and it gave a big speedup torecovery.

(Note that when the page is already in the buffer-cache, RBM_ZEROalready doesn't zero the page. So this race condition only happens whenthe page isn't in the buffer cache yet).


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race condition between hot standby and restoring a FPW

Reply via email to