subject:"Re\: \[GENERAL\] ERROR\: unexpected chunk number 0 \(expected 1\) for toast value 76753264 in pg_toast

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-30 Thread Christoph Berg

Re: Harry Ambrose 2017-08-25 

> Hi All,
> 
> Sorry to open this can of worms again. However, we are still struggling
> with this issue across quite a large amount of our estate.

Hi,

we've just seen exactly this error on a customer database running
9.5.3 (postgresql95.x86_64 9.5.3-2PGDG.rhel6). Luckily just one tuple
was affected.

Symptoms were:
# select text from k... where id = 719764749;
ERROR:  unexpected chunk number 0 (expected 1) for toast value 3347468184 in 
pg_toast_922511637
The toast table itself was perfectly ok, with a single chunk:
# select * from pg_toast.pg_toast_922511637 where chunk_id = 3347468184;
chunk_id   | 3347468184
chunk_seq  | 0
chunk_data | ...valid text string... (with bytea_output = 'escape')

Updating or deleting the field/row didn't work:
# update k... set text = '...same text as above...' where id = 719764749;
ERROR:  XX000: tuple concurrently updated
ORT:  simple_heap_delete, heapam.c:3171

# delete from k... where id = 719764749;
ERROR:  XX000: tuple concurrently updated
ORT:  simple_heap_delete, heapam.c:3171

The problem persisted over the last two weeks (eventually noticed by
pg_dump starting to fail between August 15th and 18th). The server was
started on July 26th.

Besides the ERRORing statements above, I didn't actively resolve it,
suddenly SELECTing the original row just worked again. According to
pg_stat_user_tables, autovacuum didn't hit in. I can't say if there
were backends open for two weeks. At the time it resolved itself, the
oldest backend was from August 27th.

If xmin/xmax/multixact... data from this server is interesting, I can
extract it on request.

Christoph


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-25 Thread Harry Ambrose

Hi All,

Sorry to open this can of worms again. However, we are still struggling
with this issue across quite a large amount of our estate.

>From doing some further research I stumbled across the following which
seems to sum up what we are seeing quite well...

http://grokbase.com/t/postgresql/pgsql-hackers/1321h6dpv7/getoldestxmin-going-backwards-is-dangerous-after-all

The above thread does not state whether a fix was committed, can anyone
confirm/deny?

Have a great weekend (bank holiday for some!)

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-09 Thread Scott Marlowe

On Wed, Aug 9, 2017 at 6:27 AM, ADSJ (Adam Sjøgren)  wrote:
> On 2017-06-21 Adam Sjøgren  wrote:
>
>> Adam Sjøgren  wrote:
>
>>> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
>>> the errors keep appearing the log.
>
> Just to close this, for the record: We haven't seen the errors since
> 2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time
> of writing) on 2017-06-10.
>
> Whether this means that the affected rows gradually got overwritten
> after switching to .17 and thus got fixed, or if something subtle in our
> workflow changed, so we aren't hitting this anymore, or something else
> entirely is the answer, we're not sure.
>
> We didn't get to trying Alvaro Herrera's suggestion of removing
> 6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped
> appearing "by themselves".

This sounds a lot like bad sectors getting remapped.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-09 Thread Harry Ambrose

Unfortunately we still see it frequently :(

On 9 August 2017 at 14:29, Achilleas Mantzios 
wrote:

> On 09/08/2017 15:27, ADSJ (Adam Sjøgren) wrote:
>
>> On 2017-06-21 Adam Sjøgren  wrote:
>>
>> Adam Sjøgren  wrote:
>>>
 Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
 the errors keep appearing the log.

>>> Just to close this, for the record: We haven't seen the errors since
>> 2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time
>> of writing) on 2017-06-10.
>>
>> Whether this means that the affected rows gradually got overwritten
>> after switching to .17 and thus got fixed, or if something subtle in our
>> workflow changed, so we aren't hitting this anymore, or something else
>> entirely is the answer, we're not sure.
>>
> Glad you sorted it out! You have been consistent in your effort to chase
> this down, and reverted back with your findings to close the case. Thumbs
> up!
>
>> We didn't get to trying Alvaro Herrera's suggestion of removing
>> 6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped
>> appearing "by themselves".
>>
>>
>>Best regards,
>>
>>  Adam
>>
>>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-09 Thread Achilleas Mantzios


On 09/08/2017 15:27, ADSJ (Adam Sjøgren) wrote:

On 2017-06-21 Adam Sjøgren  wrote:


Adam Sjøgren  wrote:

Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
the errors keep appearing the log.

Just to close this, for the record: We haven't seen the errors since
2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time
of writing) on 2017-06-10.

Whether this means that the affected rows gradually got overwritten
after switching to .17 and thus got fixed, or if something subtle in our
workflow changed, so we aren't hitting this anymore, or something else
entirely is the answer, we're not sure.

Glad you sorted it out! You have been consistent in your effort to chase this 
down, and reverted back with your findings to close the case. Thumbs up!

We didn't get to trying Alvaro Herrera's suggestion of removing
6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped
appearing "by themselves".


   Best regards,

 Adam



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-08-09 Thread Adam Sjøgren

On 2017-06-21 Adam Sjøgren  wrote:

> Adam Sjøgren  wrote:

>> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
>> the errors keep appearing the log.

Just to close this, for the record: We haven't seen the errors since
2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time
of writing) on 2017-06-10.

Whether this means that the affected rows gradually got overwritten
after switching to .17 and thus got fixed, or if something subtle in our
workflow changed, so we aren't hitting this anymore, or something else
entirely is the answer, we're not sure.

We didn't get to trying Alvaro Herrera's suggestion of removing
6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped
appearing "by themselves".

  Best regards,

Adam

-- 
 "My Dear Babbage. I am in much dismay at having  Adam Sjøgren
  got into so amazing a quagmire & botheration with a...@novozymes.com
  these Numbers, that I cannot possibly get the
  thing done today."

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-23 Thread Harry Ambrose

Hi Everyone,

Still trying to fathom this one. I have added quite a few log lines to a
copy of 9.4.12 and compiled it hoping to find the fault.

Below is from the log (at DEBUG5). Apologies for my name in the log lines,
it was the easiest way to grep them specifically I also apologise that its
a bit messy, i'm not a C dev.

This excerpt is without failure:

127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - UPDATE - Old tuple is 0, new tuple is 0,
table is: 2345873096
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - UPDATE - Old tuple is 0, new tuple is 0,
table is: 2345873096
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - UPDATE - Old tuple is 0, new tuple is 0,
table is: 2345873096
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - index_beginscan_internal scan: 57267920
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - index_beginscan_internal scan: 57267920
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - index_beginscan_internal scan: 57267920
127.0.0.1 2017-06-23 09:45:26.083 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  CommitTransaction
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  name: unnamed; blockState:   STARTED; state: INPROGR,
xid/subid/cid: 73603293/1/0 (used), nestlvl: 1, children:
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  StartTransaction
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 DEBUG:  name: unnamed; blockState:   DEFAULT; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
127.0.0.1 2017-06-23 09:45:26.084 BST  4061 594ccaa6.fdd postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab

This is with the failure, interesting that there are multiple flush's
nearby, could be total coincidence though?

127.0.0.1 2017-06-23 10:28:25.862 BST  1219 594cd5ba.4c3 postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
**.*.**.*** 2017-06-23 10:28:25.996 BST  24765 594bdf98.60bd postgres
walreceiver  0 DEBUG:  write 2F81/E7DA8000 flush 2F81/E7D9 apply
2F81/E7D8FBA0
**.*.**.*** 2017-06-23 10:28:25.996 BST  24765 594bdf98.60bd postgres
walreceiver  0 DEBUG:  write 2F81/E7DA8000 flush 2F81/E7DA8000 apply
2F81/E7D8FBA0

sent 16492 bytes  received 24607 bytes  82198.00 bytes/sec
total size is 16777216  speedup is 408.21
 2017-06-23 10:28:26.014 BST  24752 594bdf95.60b00 DEBUG:  archived
transaction log file "00012F8100E5"
 2017-06-23 10:28:26.018 BST  24752 594bdf95.60b00 DEBUG:
 executing archive command "rsync -e ssh -arv
 /wal/pg_xlog/00012F8100E6 postgres@:/wal/pg_xlog"
127.0.0.1 2017-06-23 10:28:26.101 BST  1219 594cd5ba.4c3 postgres [unknown]
 0 DEBUG:  HARRYAMBROSE - index_beginscan_internal scan: 630163208
127.0.0.1 2017-06-23 10:28:26.101 BST  1219 594cd5ba.4c3 postgres [unknown]
 0 STATEMENT:  VACUUM (FULL, ANALYZE) ctab
sending incremental file list

sent 69 bytes  received 12 bytes  162.00 bytes/sec
total size is 16777216  speedup is 207126.12
 2017-06-23 10:28:26.200 BST  24752 594bdf95.60b00 DEBUG:  archived
transaction log file "00012F8100E6"
 2017-06-23 10:28:26.201 BST  24752 594bdf95.60b00 DEBUG:
 executing archive command "rsync -e ssh -arv
 /wal/pg_xlog/00012F8100E7 postgres@:/wal/pg_xlog"
**.*.**.*** 2017-06-23 10:28:26.203 BST  24765 594bdf98.60bd postgres
walreceiver  0 DEBUG:  write 2F81/E7DC8000 flush 2F81/E7DA8000 apply
2F81/E7DA7FC8
**.*.**.*** 2017-06-23 10:28:26.203 BST

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-21 Thread Adam Sjøgren

Adam Sjøgren  wrote:

> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
> the errors keep appearing the log.

Just a quick update with more observations:

All the errors in the postgres.log from one of the tables are triggered
by a stored procedure that gathers data to put in a field used for full
text search - this stored procedure is called by a before update trigger
on the table. We have only seen it in the log, but not been able to
reproduce it.

We have, however, now got a row in the other big table where we can get
the error just by running a SELECT * on the row, in psql:

  user@server db=# select * from ourschema.table_a where id = 6121931;
  ERROR:  unexpected chunk number 0 (expected 1) for toast value 339846807 in 
pg_toast_10919630
  user@server db=# 

Which is both nice - we can show the error on demand - but also more
worrying, I guess, because that means the problem is "on disk".

Running this in a stored procedure over the record in question:

>   SELECT *
>   INTO rec
>   FROM table_a where id = badid;
>   detoast := substr(rec.fts::text,1,2000);
>   exception
>   when others then
>   raise notice 'data for table_a id: % is corrupt', badid;
>   continue;

also shows the error:

  user@server db=# SELECT ourschema.check_sequence(6121931, 6121931);
  NOTICE:  data for table_a id: 6121931 is corrupt
   check_sequence 

  (1 row)

We are running this over the entire (160M+ row) table now, to see if any
other rows are affected.

So, we can reproduce the error message, but we can't reproduce the
problem from scratch.

Any ideas on what to look at, given a non-transient problem-row?

Our next step will be to try to switch to 9.3.17 with
6c243f90ab6904f27fa990f1f3261e1d09a11853 reverted as suggested by Alvaro
Herrera last week.

  Best regards,

Adam

-- 
 "Lägg ditt liv i min handAdam Sjøgren
  Sälj din själ till ett band"  a...@novozymes.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-13 Thread Tom Lane

Harry Ambrose  writes:
> Not sure whether its relevant or not, however upon adding an ANALYSE before
> the second vacuum the issue has not presented when testing. I have managed
> 95 cycles thus far.

I'm still unable to reproduce :-( --- I ran about two dozen cycles
overnight with no sign of trouble.  This time I was using a master/slave
pair with the test database in a non-default partition, so neither of
those aspects seem to be key after all.

I suspect the reason for it being so hard to reproduce is that there's
a timing window involved.  But that doesn't offer much to go on in
terms of being able to make a more reproducible case.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-13 Thread Harry Ambrose

Hi,

Not sure whether its relevant or not, however upon adding an ANALYSE before
the second vacuum the issue has not presented when testing. I have managed
95 cycles thus far.

BEGIN;
CREATE TABLE x (id BIGSERIAL PRIMARY KEY, payload1 VARCHAR, payload2
VARCHAR, payload3 VARCHAR, payload4 BIGINT, payload5 BIGINT);
/* Repeat until 2,000,000 rows are inserted */
INSERT INTO x (id, payload1, payload2, payload3, payload4, payload5) VALUES
(random values of varying length/size to force random toast usage);
COMMIT;

VACUUM (ANALYZE, FULL);

BEGIN;
/* Repeat until all 2,000,000 rows are updated */
UPDATE x SET payload1 = , payload2 = , payload3 = , payload4 = , payload5 =
... again random values of varying length/size to force random toast usage
COMMIT;

ANALYZE x; -- <== New analyse here.

VACCUM (ANALYZE, FULL);

(...)

Vacuum end. 30 sec sleep

Update selective

Inserting the rows

update 0

update 2

update 4

update 6

update 8

update 10

update 12

update 14

update 16

update 18

Update all

Attempting vacuum

Vacuum completed

dropping the table

=

New attempt - number 96

Inserting the rows

Executing  0

Executing  4

Executing  8

Executing  12

Executing  16

Executing  20

Executing  24

Executing  28

(...)

Many thanks,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Harry Ambrose

Hi,

BTW, how do you get that jar to make the test table on a non-default
> tablespace?  Or are you just putting the whole test DB on a tablespace?
>
> regards, tom lane
>

I have been putting the whole database on a tablespace. It seemed easier
than modifying the jar.

Many thanks,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Tom Lane

Harry Ambrose  writes:
> - Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3
> filesystem).

BTW, how do you get that jar to make the test table on a non-default
tablespace?  Or are you just putting the whole test DB on a tablespace?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Tom Lane

Achilleas Mantzios  writes:
> After 2 full attempts, (and after bringing my poor - old workstation to its 
> knees) it still does not produce the supposed ERROR :

Yeah, I've had little luck reproducing it either.  However, I noticed a
few messages back that Harry is testing against a master/slave setup not
just a standalone server.  Maybe that's not as irrelevant as it should be.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Harry Ambrose

It seems to be very hit and miss...

The below is from the machine described in this thread running PostgreSQL
9.4.10:

update 10

update 12

update 14

update 16

update 18

Update all

Vacuum

org.postgresql.util.PSQLException: ERROR: unexpected chunk number 2285
(expected 0) for toast value 187504167 in pg_toast_187504156

at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2455)

at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2155)

at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)

at
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)

at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)

at
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)

at
org.postgresql.jdbc.PgCallableStatement.executeWithFlags(PgCallableStatement.java:78)

at
org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:157)

at Start.execute(Start.java:118)

at Start.main(Start.java:20)

=

New attempt - number 2

Inserting the rows

Executing  0

Executing  4

Executing  8

Executing  12

Executing  16

Executing  20

Thank you for trying :)

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Harry Ambrose

Hi,

pls tell me, I am currently running 2nd run in my box, (New attempt 2), and
> its in the "Attempting vacuum" phase.
> What is it supposed to do next?
> I got no errors , it has gotten my machine to its knees.
>

The jar has an endless while loop. Thus please kill the PID when you are
done testing. It loosely follows the following:

BEGIN;
CREATE TABLE x (id BIGSERIAL PRIMARY KEY, payload1 VARCHAR, payload2
VARCHAR, payload3 VARCHAR, payload4 BIGINT, payload5 BIGINT);
/* Repeat until 2,000,000 rows are inserted */
INSERT INTO x (id, payload1, payload2, payload3, payload4, payload5) VALUES
(random values of varying length/size to force random toast usage);
COMMIT;

VACUUM (ANALYZE, FULL);

BEGIN;
/* Repeat until all 2,000,000 rows are updated */
UPDATE x SET payload1 = , payload2 = , payload3 = , payload4 = , payload5 =
... again random values of varying length/size to force random toast usage
COMMIT;

VACCUM (ANALYZE, FULL);

If the error is going to occur it will happen during the second "Attempting
vacuum" phase.

1st vacuum:

c.setAutoCommit(true);

System.out.println("Attempting vacuum");

c.prepareCall("VACUUM (FULL, ANALYZE, VERBOSE) x").execute();

System.out.println("Vacuum end. 30 sec sleep");

Thread.sleep(6);

2nd vacuum:

System.out.println("Attempting vacuum");

c.prepareCall("VACUUM (FULL, ANALYZE, VERBOSE) x").execute();

System.out.println("Vacuum completed");

Thread.sleep(3);

System.out.println("Dropping the table");

c.createStatement().execute("drop table if exists x");

Thread.sleep(3);

Hope this helps!

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Achilleas Mantzios


On 12/06/2017 10:46, Harry Ambrose wrote:

Hi,

> Their suggestion is to upload to Google Drive. That or use a third party 
site, like Dropbox.

I have uploaded the jar to dropbox, link below (please let me know if you have 
any issues downloading):

https://www.dropbox.com/s/96vm465i7rwhcf8/toast-corrupter-aio.jar?dl=0


After 2 full attempts, (and after bringing my poor - old workstation to its 
knees) it still does not produce the supposed ERROR :

update 16
update 18
Updated all
Attempting vacuum
Vacuum completed
Dropping the table
=
New attempt - number 3
Creating the table if it does not exist
Inserting the rows
Executing  0
Executing  4
^C

PostgreSQL version : 9.3.4


Best wishes,
Harry



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Harry Ambrose

Hi,

> Their suggestion is to upload to Google Drive. That or use a third party
site, like Dropbox.

I have uploaded the jar to dropbox, link below (please let me know if you
have any issues downloading):

https://www.dropbox.com/s/96vm465i7rwhcf8/toast-corrupter-aio.jar?dl=0

> So I guess you run memtest86+ and it reported that your memory is indeed
ECC and also that it is working properly?

Correct, there are no issues reported. The issue can also be reproduced on
multiple different environments making the likelihood of a bad memory slim.

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Adam Sjøgren

Alvaro Herrera  wrote:

> ADSJ (Adam Sjøgren) wrote:
>
>> Our database has started reporting errors like this:
>> 
>>   2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 14242189 in pg_toast_10919630

> Does the problem still reproduce if you revert commit
> 6c243f90ab6904f27fa990f1f3261e1d09a11853?

I will try and get back to you with the results (building new .deb
packages as I type this).

Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
the errors keep appearing the log.

We have tried running a function similar to the one described in
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html as
suggested by Adrian Klaver, but we haven't been able to get any errors
from that.

This is the function we have run over our two tables:

  CREATE OR REPLACE FUNCTION check_table_a(from_id int, to_id int)
  RETURNS VOID LANGUAGE PLPGSQL AS
  $f$
  declare
  curid INT := 0;
  rec RECORD;
  badid INT;
  detoast TEXT;
  begin
  FOR badid IN SELECT id FROM table_a where id >= from_id and id <= to_id LOOP
  curid = curid + 1;
  if curid % 1 = 0 then
  raise notice '% rows inspected (%, %,%)', curid, badid, from_id, 
to_id;
  end if;
  begin
  SELECT *
  INTO rec
  FROM table_a where id = badid;
  detoast := substr(rec.fts::text,1,2000);
  exception
  when others then
  raise notice 'data for table_a id: % is corrupt', badid;
  continue;
  end;
  end loop;
  end;
  $f$;

  -- The other function has:
  --
  detoast := substr(vcontent.document,1,2000);
  --
  -- and is otherwise identical.

But no 'data for table... is corrupt' is printed.

We are only substr()'ing one field (which we know is big) for each row.
Should we do so for _all_ fields? Is there an elegant way to do so?

  Best regards,

Adam

-- 
 "Lägg ditt liv i min handAdam Sjøgren
  Sälj din själ till ett band"  a...@novozymes.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-12 Thread Achilleas Mantzios


On 09/06/2017 19:02, Harry Ambrose wrote:

Hi,
No error messages found.

- is your RAM ECC? Did you run any memtest?

Yes, memory is ECC. No error messages found.


So I guess you run memtest86+ and it reported that your memory is indeed ECC 
and also that it is working properly?

Best wishes,
Harry



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-11 Thread Adrian Klaver


On 06/11/2017 08:34 AM, Tom Lane wrote:

Alvaro Herrera  writes:

I'm unable to run this file.  Maybe it was corrupted in transit, given
that it was considered to be text.


My copy came through fine, so I tried to forward it to you off-list,
but gmail rejected it as a possible security hazard.  Do you know
which binary mime types they won't reject?


Well this is a list that they do reject:

https://support.google.com/mail/answer/6590?hl=en

So I guess you can work backwards from there.

Their suggestion is to upload to Google Drive. That or use a third party 
site, like Dropbox.




regards, tom lane





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-11 Thread Tom Lane

Alvaro Herrera  writes:
> I'm unable to run this file.  Maybe it was corrupted in transit, given
> that it was considered to be text.

My copy came through fine, so I tried to forward it to you off-list,
but gmail rejected it as a possible security hazard.  Do you know
which binary mime types they won't reject?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-11 Thread Alvaro Herrera

Harry Ambrose wrote:
> Hi,
> 
> Please find the jar attached (renamed with a .txt extension as I know some
> email services deem jars a security issue).
> 
> The jar accepts the following arguments:
> 
> $1 = host
> $2 = database
> $3 = username
> $4 = password
> $5 = port
> 
> It returns its logging to STDOUT. Please let me know if you require further
> info.

I'm unable to run this file.  Maybe it was corrupted in transit, given
that it was considered to be text.

$ md5sum toast-corrupter-aio.jar 
7b1f5854c286f9b956b9442afd455b7a  toast-corrupter-aio.jar

$ java -jar toast-corrupter-aio.jar 
Error: Invalid or corrupt jarfile toast-corrupter-aio.jar

Even unzip complains (after extracting a bunch of JDBC .class files)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-10 Thread Alvaro Herrera

ADSJ (Adam Sjøgren) wrote:
> Our database has started reporting errors like this:
> 
>   2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
> toast value 14242189 in pg_toast_10919630

Does the problem still reproduce if you revert commit
6c243f90ab6904f27fa990f1f3261e1d09a11853?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-10 Thread Harry Ambrose

Hi,

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

The jar accepts the following arguments:

$1 = host
$2 = database
$3 = username
$4 = password
$5 = port

It returns its logging to STDOUT. Please let me know if you require further
info.

Best wishes,
Harry

On 7 June 2017 at 17:46, Tom Lane  wrote:

> Harry Ambrose  writes:
> > Tom - I can provide a jar that I have been using to replicate the issue.
> Whats the best transport method to send it over?
>
> If it's not enormous, just send it as an email attachment.
>
> regards, tom lane
>


Unsupported File Types Alert.txt
Description: Unsupported File Types Alert.txt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-10 Thread Harry Ambrose

Hi,

Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).

The jar accepts the following arguments:

$1 = host
$2 = database
$3 = username
$4 = password
$5 = port

It returns its logging to STDOUT. Please let me know if you require further
info.

Best wishes,
Harry

On 7 June 2017 at 17:46, Tom Lane  wrote:

> Harry Ambrose  writes:
> > Tom - I can provide a jar that I have been using to replicate the issue.
> Whats the best transport method to send it over?
>
> If it's not enormous, just send it as an email attachment.
>
> regards, tom lane
>


Unsupported File Types Alert.txt
Description: Unsupported File Types Alert.txt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Tom Lane

Robert Lakes  writes:
> I am new to postgress and I am trying to write my first function to insert,
> update or delete and trap errors as a result of the table not existing ,
> the columns not exist or if any other error simply pass back the sqlstate

Please do not hijack an existing thread to ask an unrelated question.
Start a new thread (ie "compose" don't "reply") and use an appropriate
subject line.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Robert Lakes

Ha guys,
I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate
here's my code can you help
CREATE OR REPLACE FUNCTION listings_audit() RETURNS TRIGGER AS
$listings_audit$
  BEGIN
IF (TG_OP = 'DELETE') THEN
 IF (EXISTS (
  SELECT 1
  FROM pg_catalog.pg_class c
  JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'schema_name'
  AND   c.relname = 'table_name'
  AND   c.relkind = 'r' -- only tables
)) THEN
   INSERT INTO listings_changes
 SELECT now(), 'DELETE', OLD.*;
   RETURN OLD;
   ELSE RAISE EXCEPTION 'Table does not exists';
 END IF;
ELSIF (TG_OP = 'UPDATE') THEN
  IF (EXISTS (
  SELECT 1
  FROM pg_catalog.pg_class c
  JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'schema_name'
  AND   c.relname = 'table_name'
  AND   c.relkind = 'r' -- only tables
)) THEN
   INSERT INTO listings_changes
 SELECT now(), 'UPDATE', NEW.*;
   RETURN NEW;
   ELSE RAISE EXCEPTION 'Table does not exists';
 END IF;
ELSEIF (TG_OP = 'INSERT') THEN

   INSERT INTO listings_changes
 SELECT now(), 'INSERT', NEW.*;
   RETURN NEW;

END IF;
EXCEPTION
WHEN SQLSTATE '42611' THEN
  RAISE EXCEPTION 'Columns do not match audit file does not match user
file';
WHEN SQLSTATE '42P16' THEN
  RAISE EXCEPTION 'Table does not exists';
WHEN OTHERS THEN
  RAISE EXCEPTION 'PostgresSQL error code that has occurred';
RETURN SQLSTATE;
END;
$listings_audit$ LANGUAGE plpgsql;

On Thu, Jun 8, 2017 at 12:49 PM, Tom Lane  wrote:

> Harry Ambrose  writes:
> > Please find the jar attached (renamed with a .txt extension as I know
> some
> > email services deem jars a security issue).
>
> Hmm, the output from this script reminds me quite a lot of one I was
> sent in connection with bug #1 awhile back:
> https://www.postgresql.org/message-id/20161201165505.
> 4360.28203%40wrigleys.postgresql.org
> Was that a colleague of yours?
>
> Anyway, the bad news is I couldn't reproduce the problem then and I can't
> now.  I don't know if it's a timing issue or if there's something critical
> about configuration that I'm not duplicating.  Can you explain what sort
> of platform you're testing on, and what nondefault configuration settings
> you're using?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Harry Ambrose

Hi,

Maybe you could give some info on :
> - your ext3 mkfs and mount options (journal, barriers, etc)
>
/etc/fstab details below:
LABEL=/var/lib/pgsql/var/lib/pgsql   ext3defaults
 1 2
LABEL=/tablespace1  /tablespace1ext3defaults
 1 2
LABEL=/tablespace2  /tablespace2ext3defaults
 1 2
LABEL=/tablespace3  /tablespace3ext3defaults
 1 2
pg_default:

Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

Filesystem flags: signed_directory_hash

Default mount options:(none)

Filesystem state: clean

Errors behavior:  Continue

Filesystem OS type:   Linux

Inode count:  36634624

Block count:  146506767

Reserved block count: 7325338

Free blocks:  143785740

Free inodes:  36627866

First block:  0

Block size:   4096

Fragment size:4096

Reserved GDT blocks:  989

Blocks per group: 32768

Fragments per group:  32768

Inodes per group: 8192

Inode blocks per group:   512

RAID stride:  64

RAID stripe width:64

Filesystem created:   Fri Aug  9 16:11:53 2013

Last mount time:  Fri Apr 21 22:37:02 2017

Last write time:  Fri Apr 21 22:37:02 2017

Mount count:  2

Maximum mount count:  100

Last checked: Thu Sep 15 18:52:43 2016

Check interval:   31536000 (12 months, 5 days)

Next check after: Fri Sep 15 18:52:43 2017

Reserved blocks uid:  0 (user root)

Reserved blocks gid:  0 (group root)

First inode:  11

Inode size:   256

Required extra isize: 28

Desired extra isize:  28

Journal inode:8

Default directory hash:   half_md4

Journal backup:   inode blocks
tablespaces

Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

Filesystem flags: signed_directory_hash

Default mount options:(none)

Filesystem state: clean

Errors behavior:  Continue

Filesystem OS type:   Linux

Inode count:  73261056

Block count:  293013543

Reserved block count: 14650677

Free blocks:  286208439

Free inodes:  73174728

First block:  0

Block size:   4096

Fragment size:4096

Reserved GDT blocks:  954

Blocks per group: 32768

Fragments per group:  32768

Inodes per group: 8192

Inode blocks per group:   512

RAID stride:  64

RAID stripe width:128

Filesystem created:   Fri Aug  9 16:11:53 2013

Last mount time:  Fri Apr 21 22:37:02 2017

Last write time:  Fri Apr 21 22:37:02 2017

Mount count:  2

Maximum mount count:  100

Last checked: Thu Sep 15 18:52:43 2016

Check interval:   31536000 (12 months, 5 days)

Next check after: Fri Sep 15 18:52:43 2017

Reserved blocks uid:  0 (user root)

Reserved blocks gid:  0 (group root)

First inode:  11

Inode size:   256

Required extra isize: 28

Desired extra isize:  28

Journal inode:8

Default directory hash:   half_md4

Journal backup:   inode blocks

> - your controller setup (battery should be working good and cache mode set
> to write back)
>
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 2.0 GB
Total Cache Memory Available: 1.8 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK

> - your disks setup (write cache should be disabled)
>
Write cache is disabled, see above.

> - you should check your syslogs/messages for any errors related to storage
>
No error messages found.

> - is your RAM ECC? Did you run any memtest?
>
Yes, memory is ECC. No error messages found.

> - is your CPU overheating ?
>
No overheating issues.

> - have you experienced any crashes/freezes ?
>
No crashes/freezes experienced.

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Achilleas Mantzios


On 09/06/2017 14:44, Harry Ambrose wrote:

Hi Tom,

Thanks for attempting to replicate the issue.

Anyway, the bad news is I couldn't reproduce the problem then and I can't
now.  I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating.  Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?


Further details about the environment that I can replicate on below:

- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv  /wal/pg_xlog/%f postgres@:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m  %p %c %u %a  %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3 
filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery backed 
cache enabled.


Maybe you could give some info on :
- your ext3 mkfs and mount options (journal, barriers, etc)
- your controller setup (battery should be working good and cache mode set to 
write back)
- your disks setup (write cache should be disabled)
- you should check your syslogs/messages for any errors related to storage
- is your RAM ECC? Did you run any memtest?
- is your CPU overheating ?
- have you experienced any crashes/freezes ?



Please let me know if you require further info.

Best wishes,
Harry




--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Harry Ambrose

Hi Tom,

Thanks for attempting to replicate the issue.

Anyway, the bad news is I couldn't reproduce the problem then and I can't
> now.  I don't know if it's a timing issue or if there's something critical
> about configuration that I'm not duplicating.  Can you explain what sort
> of platform you're testing on, and what nondefault configuration settings
> you're using?
>

Further details about the environment that I can replicate on below:

- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv  /wal/pg_xlog/%f postgres@:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m  %p %c %u %a  %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3
filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery
backed cache enabled.

Please let me know if you require further info.

Best wishes,
Harry

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-08 Thread Tom Lane

Harry Ambrose  writes:
> Please find the jar attached (renamed with a .txt extension as I know some
> email services deem jars a security issue).

Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #1 awhile back:
https://www.postgresql.org/message-id/20161201165505.4360.28203%40wrigleys.postgresql.org
Was that a colleague of yours?

Anyway, the bad news is I couldn't reproduce the problem then and I can't
now.  I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating.  Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-08 Thread Adrian Klaver


On 06/08/2017 08:13 AM, ADSJ (Adam Sjøgren) wrote:

Achilleas writes:





Anyone has a handy little script lying around?


http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html




   Thanks for the response!

 Adam




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-08 Thread Adam Sjøgren

Harry writes:

> The second vacuum causes an ERROR identical to that you are reporting
> below (unexpected chunk number n (expected n) for toast value...).
> However it may take up to ten attempts to replicate it.

Interesting.

> Out of interest, are you using any tablespaces other than pg_default?
> I can only replicate the issue when using separately mounted
> tablespaces.

No, we are using pg_default only.

I hope your finding can be reproduced, it would be really interesting to
see.


  Best regards,

Adam

-- 
 "Lägg ditt liv i min handAdam Sjøgren
  Sälj din själ till ett band"  a...@novozymes.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-08 Thread Adam Sjøgren

Achilleas writes:

> First try to find which tables those toast relations refer to :
> select 10919630::regclass , 10920100::regclass ;
> Are those critical tables? Can you restore them somehow?

They are our two big tables, containing the bulk of our data (one with
168M rows, the other with 320M rows).

They are constantly being updated, but if I can identify the affected
rows, I can restore a backup on another machine and cherry pick them
from there.

> Also you may consider
> REINDEX TABLE pg_toast.pg_toast_10920100;
> REINDEX TABLE pg_toast.pg_toast_10919630;
> REINDEX TABLE ;
> REINDEX TABLE ;
>
> also VACUUM the above tables.

Yes, but I'd like to know find out why it happens, because cleaning up
and having the corruption reoccur is not so fun.

> You might want to write a function which iterates over the damaged
> table's rows in order to identify the damaged row(s). And then do some
> good update to create a new version.

Yes - we started by doing a quick pg_dump, but I guess we should switch
to something that can tell us exactly what rows hit the problem.

Anyone has a handy little script lying around?

  Thanks for the response!

Adam

-- 
 "Lägg ditt liv i min handAdam Sjøgren
  Sälj din själ till ett band"  a...@novozymes.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Tom Lane

Harry Ambrose  writes:
> Tom - I can provide a jar that I have been using to replicate the issue. 
> Whats the best transport method to send it over?

If it's not enormous, just send it as an email attachment.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Harry Ambrose

Hi,

Thanks for the responses.

> "One lesson I learned from the BSD camp when dealing with random freezes and 
> panics : when all else fails to give an answer it is time to start blaming my 
> hardware. Are those tablespaces on any cheap SSD's ?”

The tablespaces are not sat on SSD’s. Something I had also considered.

Tom - I can provide a jar that I have been using to replicate the issue. Whats 
the best transport method to send it over?

Best wishes,
Harry

> On 7 Jun 2017, at 16:27, Tom Lane  wrote:
> 
> Harry Ambrose  writes:
>> I have been following the updates to the 9.4 branch hoping a fix will 
>> appear, but sadly no luck yet. I have manually replicated the issue on 
>> 9.4.4, 9.4.10 and 9.4.12. My replication steps are:
> 
> This is a very interesting report, but you didn't actually provide a
> reproducer, just a handwavy outline.  If you submit a script that
> makes this happen, we will most definitely look into it.  But
> people aren't going to be excited about trying to reverse-engineer
> a test case out of a vague description.
> 
>> I also found the following has been reported:
>> https://www.postgresql.org/message-id/20161201165505.4360.28...@wrigleys.postgresql.org
> 
> That person never came back with a self-contained test case, either.
> 
> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
> 
>   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Tom Lane

Harry Ambrose  writes:
> I have been following the updates to the 9.4 branch hoping a fix will appear, 
> but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 
> and 9.4.12. My replication steps are:

This is a very interesting report, but you didn't actually provide a
reproducer, just a handwavy outline.  If you submit a script that
makes this happen, we will most definitely look into it.  But
people aren't going to be excited about trying to reverse-engineer
a test case out of a vague description.

> I also found the following has been reported:
> https://www.postgresql.org/message-id/20161201165505.4360.28...@wrigleys.postgresql.org

That person never came back with a self-contained test case, either.

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Achilleas Mantzios


On 07/06/2017 17:49, Harry Ambrose wrote:

Hi,
Out of interest, are you using any tablespaces other than pg_default? I can 
only replicate the issue when using separately mounted tablespaces.
One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap 
SSD's ?


I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database 
without issue.



You don't use index when pg_dump . If only the index is corrupted you can get 
away with dump/reload (but for big DBs this is unrealistic)

I also found the following has been reported: 
https://www.postgresql.org/message-id/20161201165505.4360.28...@wrigleys.postgresql.org

Best wishes,
Harry


On 7 Jun 2017, at 15:22, Achilleas Mantzios > wrote:

On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:

Our database has started reporting errors like this:

  2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 14242189 in pg_toast_10919630
  ...
  2017-06-01 11:06:56 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

  2017-06-02 05:59:50 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 47060150 in pg_toast_10920100
  ...
  2017-06-02 06:14:54 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

  2017-06-06 17:32:21 CEST ERROR:  unexpected chunk size 1996 (expected 1585) 
in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

  2017-06-07 14:49:53 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 131114834 in pg_toast_10920100
  2017-06-07 14:53:41 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 131149566 in pg_toast_10920100

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE ;
REINDEX TABLE ;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows 
in order to identify the damaged row(s). And then do some good update to create 
a new version.


The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

 "* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to 
hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.


  Best regards,

Adam



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org 
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Harry Ambrose

Hi,

I too have been experiencing this with a busy PostgreSQL instance.

I have been following the updates to the 9.4 branch hoping a fix will appear, 
but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 
and 9.4.12. My replication steps are:

BEGIN;
CREATE TABLE x (id BIGSERIAL PRIMARY KEY, payload1 VARCHAR, payload2 VARCHAR, 
payload3 VARCHAR, payload4 BIGINT, payload5 BIGINT);
/* Repeat until 2,000,000 rows are inserted */
INSERT INTO x (id, payload1, payload2, payload3, payload4, payload5) VALUES 
(random values of varying length/size to force random toast usage);
COMMIT;

VACUUM (ANALYZE, FULL);

BEGIN;
/* Repeat until all 2,000,000 rows are updated */
UPDATE x SET payload1 = , payload2 = , payload3 = , payload4 = , payload5 = ... 
again random values of varying length/size to force random toast usage
COMMIT;

VACCUM (ANALYZE, FULL);

The second vacuum causes an ERROR identical to that you are reporting below 
(unexpected chunk number n (expected n) for toast value...). However it may 
take up to ten attempts to replicate it.

Out of interest, are you using any tablespaces other than pg_default? I can 
only replicate the issue when using separately mounted tablespaces.

I have been investigating this quite extensively and everything I can find on 
the web suggests data corruption. However running the the following DO reports 
no errors and I can dump the database without issue.

DO $$
DECLARE

curid INT := 0;
vcontent RECORD;
badid BIGINT;

var1_sub VARCHAR;
var2_sub VARCHAR;
var3_sub VARCHAR;
var4_sub VARCHAR;
var5_sub VARCHAR;

BEGIN
FOR badid IN SELECT id FROM x 
LOOP
curid = curid + 1;

IF curid % 10 = 0 
THEN
RAISE NOTICE '% rows inspected', curid;
END IF;

BEGIN
SELECT *
INTO vcontent
FROM x
WHERE rowid = badid;

var1_sub := SUBSTR(vcontent.var1,2000,5000);
var2_sub := SUBSTR(vcontent.var2,2000,5000);   
var3_sub := SUBSTR(vcontent.var3,2000,5000);
var4_sub := SUBSTR(vcontent.var4::VARCHAR,2000,5000);
var5_sub := SUBSTR(vcontent.var5::VARCHAR,2000,5000);

EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Data for rowid % is corrupt', badid;
CONTINUE;
END;

END LOOP;
END;
$$;

I also found the following has been reported: 
https://www.postgresql.org/message-id/20161201165505.4360.28...@wrigleys.postgresql.org

Best wishes,
Harry

> On 7 Jun 2017, at 15:22, Achilleas Mantzios  
> wrote:
> 
> On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:
>> Our database has started reporting errors like this:
>> 
>>   2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 14242189 in pg_toast_10919630
>>   ...
>>   2017-06-01 11:06:56 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 19573520 in pg_toast_10919630
>> 
>> (157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
>> corresponds to a table with around 168 million rows.
>> 
>> These went away, but the next day we got similar errors from another
>> table:
>> 
>>   2017-06-02 05:59:50 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 47060150 in pg_toast_10920100
>>   ...
>>   2017-06-02 06:14:54 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 47226455 in pg_toast_10920100
>> 
>> (Only 4 this time) pg_toast_10920100 corresponds to a table with holds
>> around 320 million rows (these are our two large tables).
>> 
>> The next day we got 6 such errors and the day after 10 such errors. On
>> June 5th we got 94, yesterday we got 111, of which one looked a little
>> different:
>> 
>>   2017-06-06 17:32:21 CEST ERROR:  unexpected chunk size 1996 (expected 
>> 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
>> 
>> and today the logs have 65 lines, ending with these:
>> 
>>   2017-06-07 14:49:53 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 131114834 in pg_toast_10920100
>>   2017-06-07 14:53:41 CEST ERROR:  unexpected chunk number 0 (expected 1) 
>> for toast value 131149566 in pg_toast_10920100
> First try to find which tables those toast relations refer to :
> select 10919630::regclass , 10920100::regclass ;
> Are those critical tables? Can you restore them somehow?
> 
> Also you may consider
> REINDEX TABLE pg_toast.pg_toast_10920100;
> REINDEX TABLE pg_toast.pg_toast_10919630;
> REINDEX TABLE ;
> REINDEX TABLE ;
> 
> also VACUUM the above tables.
> 
> You might want to write a function which iterates over the damaged table's 
> rows in order to identify the damaged row(s). And then do some good

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-07 Thread Achilleas Mantzios


On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:

Our database has started reporting errors like this:

   2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 14242189 in pg_toast_10919630
   ...
   2017-06-01 11:06:56 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 19573520 in pg_toast_10919630

(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.

These went away, but the next day we got similar errors from another
table:

   2017-06-02 05:59:50 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 47060150 in pg_toast_10920100
   ...
   2017-06-02 06:14:54 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 47226455 in pg_toast_10920100

(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).

The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:

   2017-06-06 17:32:21 CEST ERROR:  unexpected chunk size 1996 (expected 1585) 
in final chunk 0 for toast value 114925100 in pg_toast_10920100

and today the logs have 65 lines, ending with these:

   2017-06-07 14:49:53 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 131114834 in pg_toast_10920100
   2017-06-07 14:53:41 CEST ERROR:  unexpected chunk number 0 (expected 1) for 
toast value 131149566 in pg_toast_10920100

First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE ;
REINDEX TABLE ;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows 
in order to identify the damaged row(s). And then do some good update to create 
a new version.


The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).

We are updating rows in the database a lot/continuously.

There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.

We are running PostgreSQL 9.3.14 currently.

The only thing I could see in the release notes since 9.3.14 that might
be related is this:

  "* Avoid very-low-probability data corruption due to testing tuple
 visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
 Tom Lane)"

Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?

We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.

We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to 
hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.


   Best regards,

 Adam



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

40 matches

Mail list logo