Package: postgresql-11-pg-checksums Version: 0.8-2 Severity: important The following bug report from pgsql-hackers also pertains to pg-checksums in buster:
On Tue, Mar 19, 2019 at 01:00:50PM -0700, Andres Freund wrote: > On 2019-03-20 03:27:55 +0800, Stephen Frost wrote: > > On Tue, Mar 19, 2019 at 23:59 Andres Freund <[email protected]> wrote: > > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > > > > It's torn pages that I am concerned about - the server is writing and > > > > > we are reading, and we get a mix of old and new content. We have been > > > > > quite diligent about protecting ourselves from such risks elsewhere, > > > > > and checksum verification should not be held to any lesser standard. > > > > > > > > If we see a checksum failure on an otherwise correctly read block in > > > > online mode, we retry the block on the theory that we might have read a > > > > torn page. If the checksum verification still fails, we compare its LSN > > > > to the LSN of the current checkpoint and don't mind if its newer. This > > > > way, a torn page should not cause a false positive either way I > > > > think?. > > > > > > False positives, no. But there's plenty potential for false > > > negatives. In plenty clusters a large fraction of the pages is going to > > > be touched in most checkpoints. > > > > How is it a false negative? The page was in the middle of being > > written, > > You don't actually know that. It could just be random gunk in the LSN, > and this type of logic just ignores such failures as long as the random > gunk is above the system's LSN. > > And the basebackup logic doesn't just ignore if both the checksum > failed, and the lsn is between startptr and current insertion pointer - > it just does it with *any* page that has a pd_upper != 0 and a pd_lsn > > startptr. Given typical startlsn values (skewing heavily towards lower > int64s), that means that random data is more likely than not to pass > this test. > > As it stands, the logic seems to give more false confidence than > anything else. > > > > > The basebackup checksum verification works in the same way. > > > > > > Shouldn't have been merged that way. > > > > > > I have a hard time not finding this offensive. These issues were > > considered, discussed, and well thought out, with the result being > > committed after agreement. > > Well, I don't know what to tell you. But: > > /* > * Only check pages which have not been > modified since the > * start of the base backup. Otherwise, they > might have been > * written only halfway and the checksum would > not be valid. > * However, replaying WAL would reinstate the > correct page in > * this case. We also skip completely new > pages, since they > * don't have a checksum yet. > */ > if (!PageIsNew(page) && PageGetLSN(page) < > startptr) > { > > doesn't consider plenty scenarios, as pointed out above. It'd be one > thing if the concerns I point out above were actually commented upon and > weighed not substantial enough (not that I know how). But... > > > Do you have any example cases where the code in pg_basebackup has resulted > > in either a false positive or a false negative? Any case which can be > > shown to result in either? > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, > 1000000) g(i); > SELECT pg_relation_size('corruptme'); > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || > pg_relation_filepath('corruptme'); > ┌─────────────────────────────────────┐ > │ ?column? │ > ├─────────────────────────────────────┤ > │ /srv/dev/pgdev-dev/base/13390/16384 │ > └─────────────────────────────────────┘ > (1 row) > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 > conv=notrunc > > Try a basebackup and see how many times it'll detect the corrupt > data. In the vast majority of cases you're going to see checksum > failures when reading the data for normal operation, but not when using > basebackup (or this new tool). > > At the very very least this would need to do > > a) checks that the page is all zeroes if PageIsNew() (like > PageIsVerified() does for the backend). That avoids missing cases > where corruption just zeroed out the header, but not the whole page. > b) Check that pd_lsn is between startlsn and the insertion pointer. That > avoids accepting just about all random data. > > And that'd *still* be less strenuous than what normal backends > check. And that's already not great (due to not noticing zeroed out > data). > > I fail to see how it's offensive to describe this as "shouldn't have > been merged that way". > > Greetings, > > Andres Freund The original mail is here: https://www.postgresql.org/message-id/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de Michael

