It's another possibility, but I think it's still somewhat remote given how
long we've been using this method with this code. It's sadly hard to test
because taking the full backup without the hard linking is fairly expensive
(the databases comprise multiple terabytes).
As a possibly unsatisfying
Today we have seen
2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied ---
flushed only to 1E7E/21CB79A0,writing block 9 of relation
base/16416/293974676
2013-05-28 04:11:13.316
On Tue, May 28, 2013 at 10:53 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:
Today we have seen
2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied ---
flushed only to 1E7E/21CB79A0,writing
On Tue, May 21, 2013 at 11:59 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:
We are seeing these errors on a regular basis on the testing box now. We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot.
Thanks for the response.
I have some evidence against an issue in the backup procedure (though I'm
not ruling it out). We moved back to taking the backup off of the primary
and all errors for all three clusters went away. All of the hardware is
the same, OS and postgres versions are largely the
We are seeing these errors on a regular basis on the testing box now. We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot. It still happens.
We have never seen this before we introduced the hot standby. So we
I'll try to get the primary upgraded over the weekend when we can afford a
restart.
In the meantime I have a single test showing that a shutdown, snapshot,
restart produces a backup that passes the vacuum analyze test. I'm going
to run a full vacuum today.
-David
On Wed, May 15, 2013 at 3:53
On 14.05.2013 23:47, Benedikt Grundmann wrote:
The only thing that is *new* is that we took the snapshot from the
streaming replica. So again my best guess as of now is that if the
database crashes while it is in streaming standby a invalid disk state can
result during during the following
First, thanks for the replies. This sort of thing is frustrating and hard
to diagnose at a distance, and any help is appreciated.
Here is some more background:
We have 3 9.2.4 databases using the following setup:
- A primary box
- A standby box running as a hot streaming replica from the
On 15.05.2013 15:42, David Powers wrote:
First, thanks for the replies. This sort of thing is frustrating and hard
to diagnose at a distance, and any help is appreciated.
Here is some more background:
We have 3 9.2.4 databases using the following setup:
The subject says 9.2.3. Are you sure
On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
On 15.05.2013 15:42, David Powers wrote:
First, thanks for the replies. This sort of thing is frustrating and hard
to diagnose at a distance, and any help is appreciated.
Here is some more background:
We
On 15.05.2013 22:50, Benedikt Grundmann wrote:
On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangashlinnakan...@vmware.com
The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
servers? There was a fix to a bug related to starting a standby server from
a filesystem snapshot. I
Today we have seen this on our testing database instance:
ERROR: could not open file base/16416/291498116.3 (target block 431006):
No such file or directory
That database get's created by rsyncing the LVM snapshot of the standby,
which is a readonly backup of proddb
using streaming replication.
On 14.05.2013 14:57, Benedikt Grundmann wrote:
Today we have seen this on our testing database instance:
ERROR: could not open file base/16416/291498116.3 (target block 431006):
No such file or directory
That database get's created by rsyncing the LVM snapshot of the standby,
which is a
It's on the production database and the streaming replica. But not on the
snapshot.
production
-rw--- 1 postgres postgres 312778752 May 13 21:28
/database/postgres/base/16416/291498116.3
streaming replica
-rw--- 1 postgres postgres 312778752 May 13 23:50
On 14.05.2013 16:48, Benedikt Grundmann wrote:
It's on the production database and the streaming replica. But not on the
snapshot.
So, the LVM snapshot didn't work correctly?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your
That's one possible explanation. It's worth noting that we haven't seen
this before moving to streaming rep first and we have been using that
method for a long time.
On Tue, May 14, 2013 at 11:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 14.05.2013 16:48, Benedikt Grundmann
I think my previous message wasn't clear enough. I do *NOT* think that LVM
snapshot is the culprit.
However I cannot discount it as one of the possibilities. But I have no
evidence in either /var/log/messages or in dmesg that the LVM snapshot went
into a bad state AND we have been using this
On Tuesday, May 14, 2013 7:19 PM Benedikt Grundmann wrote:
It's on the production database and the streaming replica. But not on the
snapshot.
production
-rw--- 1 postgres postgres 312778752 May 13 21:28
/database/postgres/base/16416/291498116.3
streaming replica
-rw--- 1 postgres
19 matches
Mail list logo