Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Dmitry O Litvintsev
yes, we had to restart database 4 days ago (and vacuum has resumed on start). 
I checked the log files and discovered that autovacuum on this table takes 

pages: 0 removed, 14072307 remain
tuples: 43524292 removed, 395006545 remain
buffer usage: -1493114028 hits, 107664973 misses, 30263658 dirtied
avg read rate: 1.604 MB/s, avg write rate: 0.451 MB/s
system usage: CPU 2055.81s/17710.94u sec elapsed 524356.57 sec

6 days. So it is perpetually being autovacuumed (which I assumed to be a good 
thing) 

Table has 400M entries, 115 GB. 

I will try your suggestions in the test environment. 

Thank you,
Dmitry 

From: Jeff Janes <jeff.ja...@gmail.com>
Sent: Monday, June 19, 2017 1:16 PM
To: Dmitry O Litvintsev
Cc: Andreas Kretschmer; pgsql-general@postgresql.org
Subject: Re: [GENERAL] autovacuum holds exclusive lock on table preventing it 
from to be updated

On Mon, Jun 19, 2017 at 10:33 AM, Dmitry O Litvintsev 
<litvi...@fnal.gov<mailto:litvi...@fnal.gov>> wrote:
Hi

Since I have posted this nothing really changed. I am starting to panic 
(mildly).

The source (production) runs :

  relname   |   mode   | granted |  
  substr|  query_start  
|  age
+--+-+--+---+
 t_inodes_iio_idx   | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559


This is close to unreadable.  You can use use \x to get output from psql which 
survives email more readably.

Your first report was 6 days ago.  Why is the job only 4 days old?  Are you 
frequently restarting your production server, so that the vacuum job never gets 
a chance to finish?  If so, that would explain your predicament.

And how big is this table, that it takes at least 4 days to VACUUM?

vacuum_cost_delay = 50ms

That is a lot.  The default value for this is 0.  The default value for 
autovacuum_vacuum_cost_delay is 20, which is usually too high for giant 
databases.

I think you are changing this in the wrong direction.  Rather than increase 
vacuum_cost_delay, you need to decrease autovacuum_vacuum_cost_delay, so that 
you won't keep having problems in the future.


On your test server, change vacuum_cost_delay to zero and then initiate a 
manual vacuum of the table.  It will block on the autovacuum's lock, so then 
kill the autovacuum (best to have the manual vacuum queued up first, otherwise 
it will be race between when you start the manual vacuum, and when the 
autovacuum automatically restarts, to see who gets the lock). See how long it 
takes this unthrottled vacuum to run, and how much effect the IO it causes has 
on the performance of other tasks.  If acceptable, repeat this on production 
(although really, I don't that you have much of a choice on whether the effect 
it is acceptable or not--it needs to be done.)

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Dmitry O Litvintsev
Hi

Since I have posted this nothing really changed. I am starting to panic 
(mildly).  

The source (production) runs :

  relname   |   mode   | granted |  
  substr|  query_start  
|  age   
+--+-+--+---+
 t_inodes_iio_idx   | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_pkey  | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
| ExclusiveLock| t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes   | ShareUpdateExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_itype_idx | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_imtime_idx| RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559

Above does not impact production activity a lot. 

On the test stand (where I pg_basebackupped from production and also upgraded 
to 9.6) I see:

   relname  |   mode   | granted |  
 substr   |  query_start
  |  age   
---+--+-++---+
 t_inodes  | ShareUpdateExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_itype_idx| RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_imtime_idx   | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_iio_idx  | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_pkey | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
   | ExclusiveLock| t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes  | ShareUpdateExclusiveLock | f   | 
ANALYZE;   | 2017-06-13 
15:27:59.781285-05 | 5 days 20:59:17.860273
   | ExclusiveLock| t   | 
ANALYZE;   | 2017-06-13 
15:27:59.781285-05 | 5 days 20:59:17.860273


The test stand where I was to test schema upgrade is stuck cuz vacuum is 
blocking. 

Production settings follow:

version 9.3.9

max_connections = 512
shared_buffers = 8192MB
temp_buffers = 1024MB
work_mem = 512MB
#maintenance_work_mem = 2048MB 
maintenance_work_mem = 4096MB #increased after 3 days of vacuum analyze running 
max_stack_depth = 2MB
vacuum_cost_delay = 50ms
synchronous_commit = off
wal_buffers = 245MB
wal_writer_delay = 10s
checkpoint_segments = 64
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 94GB
wal_level = hot_standby
hot_standby = on
archive_mode = on
archive_command = '/usr/loca/bin/wal_backup.sh %p %f'
max_wal_senders = 4
wal_keep_segments = 1024
max_standby_streaming_delay = 7200s

So, the problem : I cannot do schema change until vacuum has finished, and 
there 
seems to be no end in sight for vacuum to finish throwing off our software 
upgrade plans. 

Anything can be done here?

Thanks,
Dmitry


From: Andreas Kretschmer <andr...@a-kretschmer.de>
Sent: Tuesday, June 13, 2017 1:54 PM
To: pgsql-general@postgresql.org; Dmitry O Litvintsev; 
pgsql-general@postgresql.org
Subject: Re: [GENERAL] autovacuum holds exclusive lock on

[GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-13 Thread Dmitry O Litvintsev
Hi, 

I run postgresql 9.3.17. I am preparing for a major database schema upgrade. 

I copied production database to test system using pg_basebackup. 

Having started the database and waited for all WALs to be applied I proceeded 
to run 
schema modifications. 

Immediately I run into issue - updates on a table get stuck because I see that 
autovacuum is running
on that table and it holds exclusive lock:

datname |  relname   | transactionid |   mode   
| granted | usename |  substr   
|  query_start  |   age   |  pid  
-++---+--+-+-+---+---+-+---
 chimera | t_inodes_itype_idx |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:25:22.285415 | 
40672
 chimera | t_inodes_imtime_idx|   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:25:22.285415 | 
40672
 chimera | t_inodes_iio_idx   |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:25:22.285415 | 
40672
 chimera | t_inodes_pkey  |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:25:22.285415 | 
40672
 chimera ||   | ExclusiveLock   
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:25:22.285415 | 
40672
 chimera | t_inodes   |   | 
ShareUpdateExclusiveLock | t   | enstore | autovacuum: VACUUM 
public.t_inodes (to prevent wraparound)
 
If I killed autovacuum (by running SELECT pg_cancel_backend(PID) , I get at an 
update going, but then another update would get stuck by autovacuum launching 
again). 

I tried to set autovacuum to off (together w/ track_counts) and conf file. 
After restart , autovacuum still runs !
chimera=# show autovacuum;
 autovacuum 

 off
(1 row)

checking activity : 
chimera=# select 
pg_stat_activity.datname,pg_class.relname,pg_locks.transactionid,
   pg_locks.mode, pg_locks.granted,pg_stat_activity.usename,
   substr(pg_stat_activity.query,1,256),
   pg_stat_activity.query_start, age(now(),pg_stat_activity.query_start) as 
"age",
   pg_stat_activity.pid from pg_stat_activity,pg_locks
   left outer join pg_class on (pg_locks.relation = pg_class.oid)
   where pg_locks.pid=pg_stat_activity.pid order by query_start;

shows autovacuum. Seems like setting it to off does not take any effect. 

datname |  relname   | transactionid |   mode   
| granted | usename |  substr   
|  query_start  |   age   |  pid  
-++---+--+-+-+---+---+-+---
 chimera | t_inodes_itype_idx |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:28:50.276437 | 
40672
 chimera | t_inodes_imtime_idx|   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:28:50.276437 | 
40672
 chimera | t_inodes_iio_idx   |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:28:50.276437 | 
40672
 chimera | t_inodes_pkey  |   | RowExclusiveLock
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:28:50.276437 | 
40672
 chimera ||   | ExclusiveLock   
 | t   | enstore | autovacuum: VACUUM public.t_inodes (to prevent 
wraparound)| 2017-06-13 12:31:04.870064-05 | 00:28:50.276437 | 
40672
 chimera | t_inodes   |   | 
ShareUpdateExclusiveLock | t   | enstore | autovacuum: VACUUM 
public.t_inodes (to prevent 

Re: [GENERAL] postgresql93-9.3.5: deadlock when updating parent table expected?

2015-02-19 Thread Dmitry O Litvintsev
Thanks, Alvaro, 

Yes indeed.  I have a test that causes the deadlock almost immediately. 
I have upgraded to 9.3.6 and have been running for a few hours now w/o 
deadlock errors observed.

Dmitry

From: Alvaro Herrera [alvhe...@2ndquadrant.com]
Sent: Wednesday, February 18, 2015 6:19 AM
To: Dmitry O Litvintsev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] postgresql93-9.3.5: deadlock when updating parent table 
expected?

Dmitry O Litvintsev wrote:
 Hi,

 I recently updated to postgresql93-9.3.5 (from 9.2.9). I see frequent 
 deadlocks
 when updating parent table in insert into child table. There is foreign key 
 constraint between
 child table and parent table. Parent table is updated on by trigger in insert 
 into child table. So
 pretty much standard thing. Is it expected to deadlock?

This is probably caused by a bug that was fixed in 9.3.6:

Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Branch: master [0e5680f47] 2014-12-26 13:52:27 -0300
Branch: REL9_4_STABLE Release: REL9_4_1 [0e3a1f71d] 2014-12-26 13:52:27 -0300
Branch: REL9_3_STABLE Release: REL9_3_6 [048912386] 2014-12-26 13:52:27 -0300

Grab heavyweight tuple lock only before sleeping

We were trying to acquire the lock even when we were subsequently
not sleeping in some other transaction, which opens us up unnecessarily
to deadlocks.  In particular, this is troublesome if an update tries to
lock an updated version of a tuple and finds itself doing EvalPlanQual
update chain walking; more than two sessions doing this concurrently
will find themselves sleeping on each other because the HW tuple lock
acquisition in heap_lock_tuple called from EvalPlanQualFetch races with
the same tuple lock being acquired in heap_update -- one of these
sessions sleeps on the other one to finish while holding the tuple lock,
and the other one sleeps on the tuple lock.

Per trouble report from Andrew Sackville-West in

http://www.postgresql.org/message-id/20140731233051.GN17765@andrew-ThinkPad-X230

His scenario can be simplified down to a relatively simple
isolationtester spec file which I don't include in this commit; the
reason is that the current isolationtester is not able to deal with more
than one blocked session concurrently and it blocks instead of raising
the expected deadlock.  In the future, if we improve isolationtester, it
would be good to include the spec file in the isolation schedule.  I
posted it in
http://www.postgresql.org/message-id/20141212205254.gc1...@alvh.no-ip.org

Hat tip to Mark Kirkwood, who helped diagnose the trouble.

--
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql93-9.3.5: deadlock when updating parent table expected?

2015-02-18 Thread Dmitry O Litvintsev
Thank you, Bill, 

Yes, deadlock occurs when there are multiple processes insert 
concurrently into file table with the same volume id field. 
I used sometimes  as opposed to all the time. 

I think you advise to retry transaction or add select for update prior
to insert. I will pursue this (together with upgrade to 9.3.6 suggested by 
Alvaro). 

Thanks,
Dmitry


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] postgresql93-9.3.5: deadlock when updating parent table expected?

2015-02-17 Thread Dmitry O Litvintsev
Hi, 

I recently updated to postgresql93-9.3.5 (from 9.2.9). I see frequent deadlocks 
when updating parent table in insert into child table. There is foreign key 
constraint between 
child table and parent table. Parent table is updated on by trigger in insert 
into child table. So 
pretty much standard thing. Is it expected to deadlock?

A simplified version:

 create table volume (
   id serial primary key, 
   name varchar, 
   counter integer default(0));

 create table file (
   id serial primary key, 
   name varchar, 
   volume bigint, foreign key (volume) references volume(id));
  
 create or replace function update_volume_file_counter() 
 returns trigger as $$
 begin
 if (tg_op='INSERT') then
  update volume set counter=counter+1 where volume.id=new.volume;
  return new;
 elseif (tg_op='DELETE') then
  update volume set counter=counter-1 where volume.id=old.volume;
  return old;
 end if;
 end;
 $$
 language plpgsql;

 create trigger update_volume_counter
after insert or delete on file
for each row 
execute procedure update_volume_file_counter();

So record is inserted into file table and counter gets updated in volume table. 
Nothing 
fancy. 

insert into volume (name) values ('foo');
insert into file(name,volume) values ('f1',(select id from volume where 
name='foo'));
insert into file(name,volume) values ('f2',(select id from volume where 
name='foo'));

select * from volume;
 id | name | counter 
+--+-
  2 | foo  |   2
(1 row)

delete from file where name='f2';
DELETE 1
billing=# select * from volume;
 id | name | counter 
+--+-
  2 | foo  |   1
(1 row)

So, counter increments/decrements as it should.
Works fine. 
But in real life application where multiple threads are inserting into file 
table I see sometimes:

CSTERROR:  deadlock detected
 Process 24611 waits for ExclusiveLock on tuple (1749,58) of relation 138328329 
of database 138328263; blocked by process 25082.
 Process 25082 waits for ShareLock on transaction 14829630; blocked by process 
24611.
 Process 24611: update volume set counter=counter+1 where 
id=new.volume;
 Process 25082: insert into file(name,volume) 
values('f1',(select id from volume where name='foo'));
  CSTHINT:  See server log for query details.

(not a real log file excerpt). 

This does not happen all the time, happens sometimes when multiple threads 
add file to the same volume;. 

Question - am I doing something wrong or this deadlock is expected? ( I read 
somewhere 
that when inserting into  child table the corresponding record of parent table 
is locked). 
I did not seem to encounter this issue in postgresql 9.2 and 8.4 which I had 
before. 

Should I drop foreign key constraint ?

Thanks,
Dmitry

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general