Package: postgresql-11-pg-checksums
Version: 0.8-2
Severity: important

The following bug report from pgsql-hackers also pertains to
pg-checksums in buster:

On Tue, Mar 19, 2019 at 01:00:50PM -0700, Andres Freund wrote:
> On 2019-03-20 03:27:55 +0800, Stephen Frost wrote:
> > On Tue, Mar 19, 2019 at 23:59 Andres Freund <[email protected]> wrote:
> > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas:
> > > > > It's torn pages that I am concerned about - the server is writing and
> > > > > we are reading, and we get a mix of old and new content.  We have been
> > > > > quite diligent about protecting ourselves from such risks elsewhere,
> > > > > and checksum verification should not be held to any lesser standard.
> > > >
> > > > If we see a checksum failure on an otherwise correctly read block in
> > > > online mode, we retry the block on the theory that we might have read a
> > > > torn page.  If the checksum verification still fails, we compare its LSN
> > > > to the LSN of the current checkpoint and don't mind if its newer.  This
> > > > way, a torn page should not cause a false positive either way I
> > > > think?.
> > >
> > > False positives, no. But there's plenty potential for false
> > > negatives. In plenty clusters a large fraction of the pages is going to
> > > be touched in most checkpoints.
> >
> > How is it a false negative?  The page was in the middle of being
> > written,
>
> You don't actually know that. It could just be random gunk in the LSN,
> and this type of logic just ignores such failures as long as the random
> gunk is above the system's LSN.
>
> And the basebackup logic doesn't just ignore if both the checksum
> failed, and the lsn is between startptr and current insertion pointer -
> it just does it with *any* page that has a pd_upper != 0 and a pd_lsn >
> startptr. Given typical startlsn values (skewing heavily towards lower
> int64s), that means that random data is more likely than not to pass
> this test.
>
> As it stands, the logic seems to give more false confidence than
> anything else.
>
>
> > > The basebackup checksum verification works in the same way.
> > >
> > > Shouldn't have been merged that way.
> >
> >
> > I have a hard time not finding this offensive.  These issues were
> > considered, discussed, and well thought out, with the result being
> > committed after agreement.
>
> Well, I don't know what to tell you. But:
>
>                               /*
>                                * Only check pages which have not been 
> modified since the
>                                * start of the base backup. Otherwise, they 
> might have been
>                                * written only halfway and the checksum would 
> not be valid.
>                                * However, replaying WAL would reinstate the 
> correct page in
>                                * this case. We also skip completely new 
> pages, since they
>                                * don't have a checksum yet.
>                                */
>                               if (!PageIsNew(page) && PageGetLSN(page) < 
> startptr)
>                               {
>
> doesn't consider plenty scenarios, as pointed out above.  It'd be one
> thing if the concerns I point out above were actually commented upon and
> weighed not substantial enough (not that I know how). But...
>
> > Do you have any example cases where the code in pg_basebackup has resulted
> > in either a false positive or a false negative?  Any case which can be
> > shown to result in either?
>
> CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 
> 1000000) g(i);
> SELECT pg_relation_size('corruptme');
> postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || 
> pg_relation_filepath('corruptme');
> ┌─────────────────────────────────────┐
> │              ?column?               │
> ├─────────────────────────────────────┤
> │ /srv/dev/pgdev-dev/base/13390/16384 │
> └─────────────────────────────────────┘
> (1 row)
> dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 
> conv=notrunc
>
> Try a basebackup and see how many times it'll detect the corrupt
> data. In the vast majority of cases you're going to see checksum
> failures when reading the data for normal operation, but not when using
> basebackup (or this new tool).
>
> At the very very least this would need to do
>
> a) checks that the page is all zeroes if PageIsNew() (like
>    PageIsVerified() does for the backend). That avoids missing cases
>    where corruption just zeroed out the header, but not the whole page.
> b) Check that pd_lsn is between startlsn and the insertion pointer. That
>    avoids accepting just about all random data.
>
> And that'd *still* be less strenuous than what normal backends
> check. And that's already not great (due to not noticing zeroed out
> data).
>
> I fail to see how it's offensive to describe this as "shouldn't have
> been merged that way".
>
> Greetings,
>
> Andres Freund

The original mail is here:

https://www.postgresql.org/message-id/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de


Michael

Reply via email to