On Thu, Apr 2, 2015 at 2:55 PM, Tom Lane <[email protected]> wrote: > Robert Haas <[email protected]> writes: >> On Thu, Apr 2, 2015 at 2:40 PM, Tom Lane <[email protected]> wrote: >>> However, I'm having second thoughts about whether we've fully diagnosed >>> this. Three out of the four failures we've seen in the buildfarm reported >>> "cache lookup failed for access method 403", not "could not open relation >>> with OID 2601" ... and I'm so far only able to replicate the latter >>> symptom. It's really unclear how the former one could arise, because >>> nothing that vacuum.sql does would change xmin of the rows in pg_am. > >> It probably changes the *relfilenode* of pg_am, because it runs VACUUM >> FULL on that catalog. Perhaps some backend sees the old relfilenode >> value and tries to a heap scan, interpreting the now-truncated file as >> empty? > > Yeah, I came up with the same theory a few minutes later. Trying to > reproduce on that basis. > > Actually, now that I think it through, the "could not open relation" > error is pretty odd in itself. If we are trying to open pg_am using > a stale catalog snapshot, it seems like we ought to reliably find its > old pg_class tuple (the one with the obsolete relfilenode), rather than > finding nothing. But the latter is the behavior I'm seeing.
What's to stop the old tuple from being HOT-pruned? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
