Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Jeff Janes
On Thu, Feb 23, 2017 at 2:42 AM, Bill Moran 
wrote:

> On Wed, 22 Feb 2017 13:19:11 -0800
> Jeff Janes  wrote:
>
> > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure 
> wrote:
> > >
> > > On Thursday, February 16, 2017, Tom Lane  wrote:
> > >
> > >> Tim Bellis  writes:
> > >> > Even though this is a read only query, is it also expected to be
> > >> blocked behind the vacuum? Is there a way of getting indexes for a
> table
> > >> which won't be blocked behind a vacuum?
> > >>
> > >> It's not the vacuum that's blocking your read-only queries.  It's the
> > >> ALTER TABLE, which needs an exclusive lock in order to alter the
> table's
> > >> schema.  The ALTER is queued waiting for the vacuum to finish, and
> lesser
> > >> lock requests queue up behind it.  We could let the non-exclusive lock
> > >> requests go ahead of the ALTER, but that would create a severe risk
> of the
> > >> ALTER *never* getting to run.
> > >>
> > >> I'd kill the ALTER and figure on trying again after the vacuum is
> done.
> > >>
> > >>
> > > I've been drilled by this and similar lock stacking issues enough
> times to
> > > make me near 100% sure deferring the ALTER would be the better choice
> > >
> > >
> > This seems like a rather one-sided observation.  How could you know how
> > often the unimplemented behavior also would have "drilled" you, since it
> is
> > unimplemented?
> >
> > There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow
> other
> > requestors jump the queue if they were compatible with the held lock.  If
> > that is implemented, then you would just manually lock the table
> deferably
> > before invoking the ALTER TABLE command, if that is the behavior you
> wanted
> > (but it wouldn't work for things that can't be run in transactions)
>
> This seems redundant to me.
>
> We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade
> script that uses LOCK to explicitly lock tables that it's going to ALTER,
> then busy-waits if the lock is not immediately grantable.
>

As fairly trivial as it is, I bet I would mess it up a few times before I
got it right.  And then it would probably still be wrong in corner cases.
What if it fails not because the lock is unavailable, but for some obscure
error other than the ones anticipated or encountered during testing? And
busy-waiting is generally nasty and a waste of resources.


>
> The fact that so many ORMs and similar tools don't take advantage of that
> functionality is rather depressing.
>
> In my experience, I've also seen heavily loaded systems that this wouldn't
> work on, essentially because there is _always_ _some_ lock on every table.
> This is a case where experienced developers are required to take some
> extra time to coordinate their upgrades to work around the high load. But
> the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because
> the ALTER would be deferred indefinitely.
>

True.  That is why it would be available only upon request, not the new
default.



> Personally, I feel like the existing behavior is preferrable. Software
> teams need to take the time to understand the locking implications of their
> actions or they'll have nothing but trouble anyway.
>
> As I've seen time and again: writing an application that handles low load
> and low concurrency is fairly trivial, but scaling that app up to high
> load and/or high concurrency generally sorts out the truely brilliant
> developers from the merely average.
>

So why not give the merely average some better tools?

Cheers,

Jeff


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Merlin Moncure
On Wed, Feb 22, 2017 at 3:19 PM, Jeff Janes  wrote:
> On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure  wrote:
>> On Thursday, February 16, 2017, Tom Lane  wrote:
>>>
>>> Tim Bellis  writes:
>>> > Even though this is a read only query, is it also expected to be
>>> > blocked behind the vacuum? Is there a way of getting indexes for a table
>>> > which won't be blocked behind a vacuum?
>>>
>>> It's not the vacuum that's blocking your read-only queries.  It's the
>>> ALTER TABLE, which needs an exclusive lock in order to alter the table's
>>> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
>>> lock requests queue up behind it.  We could let the non-exclusive lock
>>> requests go ahead of the ALTER, but that would create a severe risk of
>>> the
>>> ALTER *never* getting to run.
>>>
>>> I'd kill the ALTER and figure on trying again after the vacuum is done.
>>
>> I've been drilled by this and similar lock stacking issues enough times to
>> make me near 100% sure deferring the ALTER would be the better choice
>
> This seems like a rather one-sided observation.  How could you know how
> often the unimplemented behavior also would have "drilled" you, since it is
> unimplemented?

Well, that I can't really say, but at least in my case ALTER TABLE in
the face of concurrent application activity can wait but locking
tables for reading for an indefinite period will rapidly destabilize
the system.  An example of this usage is replacing partitions on a
warehouse table.  About half of my P1s over the last 12 months ago are
relating to locking problem of some kind.

So I end up during workarounds such as issuing "LOCK...NOWAIT" in a
sleep loop :( or application restructuring, especially minimizing use
of TRUNCATE.

I do think instrumentation around locking behaviors would be helpful.
Allowing (optionally) waiters to leapfrog in if they can clear would
be wonderful as would being able to specify maximum wait timeouts
inside a transaction.  FWIW, I'm not sure this behavior makes sense
attached to LOCK, I'd rather see them attached generally to SET
TRANSACTION -- my 0.02$ (talk is cheap, etc).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Bill Moran
On Wed, 22 Feb 2017 13:19:11 -0800
Jeff Janes  wrote:

> On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure  wrote:
> >
> > On Thursday, February 16, 2017, Tom Lane  wrote:
> >
> >> Tim Bellis  writes:
> >> > Even though this is a read only query, is it also expected to be
> >> blocked behind the vacuum? Is there a way of getting indexes for a table
> >> which won't be blocked behind a vacuum?
> >>
> >> It's not the vacuum that's blocking your read-only queries.  It's the
> >> ALTER TABLE, which needs an exclusive lock in order to alter the table's
> >> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
> >> lock requests queue up behind it.  We could let the non-exclusive lock
> >> requests go ahead of the ALTER, but that would create a severe risk of the
> >> ALTER *never* getting to run.
> >>
> >> I'd kill the ALTER and figure on trying again after the vacuum is done.
> >>
> >>
> > I've been drilled by this and similar lock stacking issues enough times to
> > make me near 100% sure deferring the ALTER would be the better choice
> >
> >
> This seems like a rather one-sided observation.  How could you know how
> often the unimplemented behavior also would have "drilled" you, since it is
> unimplemented?
> 
> There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other
> requestors jump the queue if they were compatible with the held lock.  If
> that is implemented, then you would just manually lock the table deferably
> before invoking the ALTER TABLE command, if that is the behavior you wanted
> (but it wouldn't work for things that can't be run in transactions)

This seems redundant to me.

We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade
script that uses LOCK to explicitly lock tables that it's going to ALTER,
then busy-waits if the lock is not immediately grantable.

The fact that so many ORMs and similar tools don't take advantage of that
functionality is rather depressing.

In my experience, I've also seen heavily loaded systems that this wouldn't
work on, essentially because there is _always_ _some_ lock on every table.
This is a case where experienced developers are required to take some
extra time to coordinate their upgrades to work around the high load. But
the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because
the ALTER would be deferred indefinitely.

Personally, I feel like the existing behavior is preferrable. Software
teams need to take the time to understand the locking implications of their
actions or they'll have nothing but trouble anyway.

As I've seen time and again: writing an application that handles low load
and low concurrency is fairly trivial, but scaling that app up to high
load and/or high concurrency generally sorts out the truely brilliant
developers from the merely average.

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-22 Thread Jeff Janes
On Fri, Feb 17, 2017 at 10:36 AM, Tim Bellis <tim.bel...@metaswitch.com>
wrote:

>
>
>
>
> *From:* Jeff Janes [mailto:jeff.ja...@gmail.com]
> *Sent:* 17 February 2017 02:59
> *To:* Tim Bellis <tim.bel...@metaswitch.com>
> *Cc:* pgsql-general@postgresql.org
> *Subject:* Re: [GENERAL] Autovacuum stuck for hours, blocking queries
>
>
>
> On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <tim.bel...@metaswitch.com>
> wrote:
>
> I have a postgres 9.3.4 database table which (intermittently but reliably)
> gets into a state where queries get blocked indefinitely (at least for many
> hours) behind an automatic vacuum. I was under the impression that vacuum
> should never take any blocking locks for any significant period of time,
> and so would like help resolving the issue.
>
> The process blocking the query is:
> postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum
> worker process   
> which is running the query
> autovacuum: VACUUM public.
>
>
>
> Are you sure it doesn't really say:
>
>
>
> autovacuum: VACUUM public. (to prevent wraparound)
>
> *[Tim Bellis] It doesn’t. I was using the query from *
> *https://wiki.postgresql.org/wiki/Lock_Monitoring*
> <https://wiki.postgresql.org/wiki/Lock_Monitoring>* and looking at the
> ‘current_statement_in_blocking_process’ column. Is there a different query
> I should be using?*
>

That query seems to be a bit mangled.  At one time, it only found row-level
locks.  Someone changed that, but didn't remove the comment "these only
find row-level locks, not object-level locks"

Also, the "WHERE NOT blocked_locks.GRANTED" should perhaps be:

WHERE NOT blocked_locks.GRANTED and blocking_locks.GRANTED;

As it is, every waiting query reports that it is waiting on all of its
fellow victims as well as the thing(s) actually blocking it. But my WHERE
clause is not really correct either, as it is possible that it is one
blocked thing is being blocked by a different blocked thing which is ahead
of it in the queue, when without that intervening blocked requestor it
could be immediately granted if its request mode is compatible with the
held mode(s).  I don't think there is a query that can reveal what is most
immediately blocking it.

But, I don't see how this explains what you see.  An autovacuum without
"(to prevent wraparound)" should not block anything for much more than a
second (unless you changed deadlock_timeout) and should not be blocked by
anything either as it just gives up on the operation if the lock is not
immediately available.

Cheers,

Jeff


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-22 Thread Jeff Janes
On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure  wrote:

>
>
> On Thursday, February 16, 2017, Tom Lane  wrote:
>
>> Tim Bellis  writes:
>> > Even though this is a read only query, is it also expected to be
>> blocked behind the vacuum? Is there a way of getting indexes for a table
>> which won't be blocked behind a vacuum?
>>
>> It's not the vacuum that's blocking your read-only queries.  It's the
>> ALTER TABLE, which needs an exclusive lock in order to alter the table's
>> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
>> lock requests queue up behind it.  We could let the non-exclusive lock
>> requests go ahead of the ALTER, but that would create a severe risk of the
>> ALTER *never* getting to run.
>>
>> I'd kill the ALTER and figure on trying again after the vacuum is done.
>>
>>
> I've been drilled by this and similar lock stacking issues enough times to
> make me near 100% sure deferring the ALTER would be the better choice
>
>
This seems like a rather one-sided observation.  How could you know how
often the unimplemented behavior also would have "drilled" you, since it is
unimplemented?

There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other
requestors jump the queue if they were compatible with the held lock.  If
that is implemented, then you would just manually lock the table deferably
before invoking the ALTER TABLE command, if that is the behavior you wanted
(but it wouldn't work for things that can't be run in transactions)

Ideally each requestor would specify if they will hold the lock for a long
timer or a short time.  Them a short requestor which is blocked behind a
long requestor could let other compatible-with-held requests jump over it.
But once it was only blocked by short locks, it would reassert the normal
order, so it can't get permanently blocked by a constantly overlapping
stream of short locks.  But how would you get all lock requestors to
provide a reasonable estimate?

Cheers,

Jeff


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-21 Thread Tim Bellis


-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: 16 February 2017 22:40
To: Tim Bellis <tim.bel...@metaswitch.com>
Cc: Adrian Klaver <adrian.kla...@aklaver.com>; pgsql-general@postgresql.org; 
Alvaro Herrera <alvhe...@2ndquadrant.com>; Scott Marlowe 
<scott.marl...@gmail.com>
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries

Tim Bellis <tim.bel...@metaswitch.com> writes:
> > Even though this is a read only query, is it also expected to be blocked 
> > behind the vacuum? Is there a way of getting indexes for a table which 
> > won't be blocked behind a vacuum?

> It's not the vacuum that's blocking your read-only queries.  It's the ALTER 
> TABLE, which needs an exclusive lock in order to alter the table's schema.  
> The ALTER is queued waiting for the vacuum to finish, and lesser lock 
> requests queue up behind it.  We could let the non-exclusive lock requests go 
> ahead of the ALTER, but that would create a severe risk of the ALTER *never* 
> getting to run.

The lock monitoring query (https://wiki.postgresql.org/wiki/Lock_Monitoring) 
said that the blocking_pid and the current_statement_in_blocking_process for 
the queries reading the index data was the autovacuum, not the ALTER. Am I 
reading the output wrong? Does it not correctly represent the chain of locks?

> I'd kill the ALTER and figure on trying again after the vacuum is done.

> Also you might want to look into how you got into a situation where you have 
> an anti-wraparound vacuum that's taking so long to run.
> You didn't do something silly like disable autovacuum did you?
No, autovacuum is on (and this is an autovacuum which is in progress). But I 
will look at why I'm getting a blocking autovacuum.

>   regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-20 Thread Merlin Moncure
On Thursday, February 16, 2017, Tom Lane  wrote:

> Tim Bellis > writes:
> > Even though this is a read only query, is it also expected to be blocked
> behind the vacuum? Is there a way of getting indexes for a table which
> won't be blocked behind a vacuum?
>
> It's not the vacuum that's blocking your read-only queries.  It's the
> ALTER TABLE, which needs an exclusive lock in order to alter the table's
> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
> lock requests queue up behind it.  We could let the non-exclusive lock
> requests go ahead of the ALTER, but that would create a severe risk of the
> ALTER *never* getting to run.
>
> I'd kill the ALTER and figure on trying again after the vacuum is done.
>
>
I've been drilled by this and similar lock stacking issues enough times to
make me near 100% sure deferring the ALTER would be the better choice


merlin


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-18 Thread Adrian Klaver

On 02/17/2017 11:54 PM, Michael Paquier wrote:

On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowe  wrote:

Yes it can. Truncate has been rollbackable for a while now.


Per the docs:
https://www.postgresql.org/docs/9.6/static/sql-truncate.html
"TRUNCATE is transaction-safe with respect to the data in the tables:
the truncation will be safely rolled back if the surrounding
transaction does not commit."
In short yes a transaction doing a truncate can be rollbacked.



I think the part that confuses people into thinking it can not be 
rollbacked is this:


"TRUNCATE is not MVCC-safe. After truncation, the table will appear 
empty to concurrent transactions, if they are using a snapshot taken 
before the truncation occurred. See Section 13.5 for more details."



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Michael Paquier
On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowe  wrote:
> Yes it can. Truncate has been rollbackable for a while now.

Per the docs:
https://www.postgresql.org/docs/9.6/static/sql-truncate.html
"TRUNCATE is transaction-safe with respect to the data in the tables:
the truncation will be safely rolled back if the surrounding
transaction does not commit."
In short yes a transaction doing a truncate can be rollbacked.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Scott Marlowe
On Fri, Feb 17, 2017 at 1:38 PM, Rakesh Kumar
 wrote:
> LOCK TABLE yourtable ;
> CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ;
> TRUNCATE yourtable;
> INSERT INTO yourtable SELECT * from keep;
> COMMIT;
> ===
> the above snippet assumes truncate in PG can be in a transaction. In other 
> words, while truncate by itself
> is atomic, it can't be rolled back. So in the above case, if "INSERT INTO 
> yourtable SELECT * from keep;" and
> we rollback, will it rollback yourtable.


Yes it can. Truncate has been rollbackable for a while now.

begin;
create table
insert into table
truncate old table
. something goes wrong .
rollback;

Unless I misunderstand your meaning.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Rakesh Kumar
LOCK TABLE yourtable ;
CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ;
TRUNCATE yourtable;
INSERT INTO yourtable SELECT * from keep;
COMMIT;
===
the above snippet assumes truncate in PG can be in a transaction. In other 
words, while truncate by itself
is atomic, it can't be rolled back. So in the above case, if "INSERT INTO 
yourtable SELECT * from keep;" and
we rollback, will it rollback yourtable.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Hannes Erven

Hi Tim,


Am 2017-02-17 um 17:02 schrieb Tim Bellis:

The DELETE operations only deletes rows from the

> previous day. It's possible that there have been rows
> added that day which ought not to be deleted, so
> TRUNCATE wouldn't work.

OK, then I'll try two other suggestions:

- use table partitioning ( 
https://www.postgresql.org/docs/9.3/static/ddl-partitioning.html )


- if the number of rows you need to keep is small, you could try 
something like this:

LOCK TABLE yourtable ;
CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ;
TRUNCATE yourtable;
INSERT INTO yourtable SELECT * from keep;
COMMIT;


Best regards,

-hannes



-Original Message-
From: Hannes Erven [mailto:han...@erven.at]
Sent: 17 February 2017 11:47
To: pgsql-general@postgresql.org
Cc: Tim Bellis <tim.bel...@metaswitch.com>
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries

Hi Tim,



Am 2017-02-15 um 18:30 schrieb Tim Bellis:

I have a postgres 9.3.4 database table which (intermittently but
reliably)

 > gets into a state where queries get blocked indefinitely  > [..]

Notes:
  - This database table is used for about 6 million row writes per
day,

 > all of which are then deleted at the end of the day.

If you are dumping the contents of the table anyways, why not use TRUNCATE 
instead of DELETE? It unlinks and recreates the table data files, requiring 
nearly zero IO and analyzing.
Or even drop, and recreate the table with the correct new structure so you do 
not even need to ALTER TABLE ?

I'm a bit skeptical of these suggestions since very competent people have already 
answered your post and did not come up with this... ;-) the only drawback I'm aware of is 
that TRUNCATE will immediatly free disk space on the OS level, so the table's space will 
not be "blocked". But probably the VACUUM you are currently performing will 
also eventually release the unused disk space, so this may or may not match the current 
behaviour.


Best regards,

-hannes







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Tim Bellis


From: Jeff Janes [mailto:jeff.ja...@gmail.com]
Sent: 17 February 2017 02:59
To: Tim Bellis <tim.bel...@metaswitch.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries

On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis 
<tim.bel...@metaswitch.com<mailto:tim.bel...@metaswitch.com>> wrote:
I have a postgres 9.3.4 database table which (intermittently but reliably) gets 
into a state where queries get blocked indefinitely (at least for many hours) 
behind an automatic vacuum. I was under the impression that vacuum should never 
take any blocking locks for any significant period of time, and so would like 
help resolving the issue.

The process blocking the query is:
postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker 
process   
which is running the query
autovacuum: VACUUM public.

Are you sure it doesn't really say:

autovacuum: VACUUM public. (to prevent wraparound)
[Tim Bellis] It doesn’t. I was using the query from 
https://wiki.postgresql.org/wiki/Lock_Monitoring and looking at the 
‘current_statement_in_blocking_process’ column. Is there a different query I 
should be using?

If it doesn't include the "to prevent wraparound", then it should sacrifice 
itself as soon as it realizes it is blocking something else.  If it is not 
doing that, something is wrong.

If it does say "(to prevent wraparound)", then see all the other comments on 
this thread.


Notes:
  - This database table is used for about 6 million row writes per day, all of 
which are then deleted at the end of the day.

How many transactions do those 6 million writes comprise?
[Tim Bellis] I’m not sure - I think it’s 6 million transactions, but there 
might be some batching going on that I’m unaware of. What would this affect? (I 
can dig in if necessary)
(I might have been slightly wrong in characterising the exact behaviour; the 
table might be cleared every hour rather than every day, but there are still 6 
million writes per day)


  - Our application kicks off a manual vacuum against this table each night 
which doesn't hit this problem, as far as we're aware.

If it were having problems, would you be aware of it?  Do you see in the log 
files the completion of the vacuum?  Or look in pg_stat_user_tables to see when 
last_vacuum was.  If it runs every night and succeeds, it is hard to see why 
wraparound would ever kick in.  Unless you are hitting 150,000,000 transactions 
in a day.
[Tim Bellis] I shall investigate this.

Cheers,

Jeff


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Tim Bellis
The DELETE operations only deletes rows from the previous day. It's possible 
that there have been rows added that day which ought not to be deleted, so 
TRUNCATE wouldn't work.

But that was a helpful suggestion - thanks!

Tim

-Original Message-
From: Hannes Erven [mailto:han...@erven.at] 
Sent: 17 February 2017 11:47
To: pgsql-general@postgresql.org
Cc: Tim Bellis <tim.bel...@metaswitch.com>
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries

Hi Tim,



Am 2017-02-15 um 18:30 schrieb Tim Bellis:
> I have a postgres 9.3.4 database table which (intermittently but 
> reliably)
 > gets into a state where queries get blocked indefinitely  > [..]
> Notes:
>   - This database table is used for about 6 million row writes per 
> day,
 > all of which are then deleted at the end of the day.

If you are dumping the contents of the table anyways, why not use TRUNCATE 
instead of DELETE? It unlinks and recreates the table data files, requiring 
nearly zero IO and analyzing.
Or even drop, and recreate the table with the correct new structure so you do 
not even need to ALTER TABLE ?

I'm a bit skeptical of these suggestions since very competent people have 
already answered your post and did not come up with this... ;-) the only 
drawback I'm aware of is that TRUNCATE will immediatly free disk space on the 
OS level, so the table's space will not be "blocked". But probably the VACUUM 
you are currently performing will also eventually release the unused disk 
space, so this may or may not match the current behaviour.


Best regards,

-hannes



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-17 Thread Hannes Erven

Hi Tim,



Am 2017-02-15 um 18:30 schrieb Tim Bellis:

I have a postgres 9.3.4 database table which (intermittently but reliably)

> gets into a state where queries get blocked indefinitely
> [..]

Notes:
  - This database table is used for about 6 million row writes per day,

> all of which are then deleted at the end of the day.

If you are dumping the contents of the table anyways, why not use 
TRUNCATE instead of DELETE? It unlinks and recreates the table data 
files, requiring nearly zero IO and analyzing.
Or even drop, and recreate the table with the correct new structure so 
you do not even need to ALTER TABLE ?


I'm a bit skeptical of these suggestions since very competent people 
have already answered your post and did not come up with this... ;-) the 
only drawback I'm aware of is that TRUNCATE will immediatly free disk 
space on the OS level, so the table's space will not be "blocked". But 
probably the VACUUM you are currently performing will also eventually 
release the unused disk space, so this may or may not match the current 
behaviour.



Best regards,

-hannes



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-16 Thread Alvaro Herrera
Tom Lane wrote:

> Also you might want to look into how you got into a situation where
> you have an anti-wraparound vacuum that's taking so long to run.

If there are ALTERs running all the time, regular (non-anti-wraparound)
vacuums would be canceled and never get a chance to run.  Eventually,
autovacuum decides it's had enough and doesn't cancel anymore, so
everyone else gets stuck behind.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-16 Thread Tom Lane
Tim Bellis  writes:
> Even though this is a read only query, is it also expected to be blocked 
> behind the vacuum? Is there a way of getting indexes for a table which won't 
> be blocked behind a vacuum?

It's not the vacuum that's blocking your read-only queries.  It's the
ALTER TABLE, which needs an exclusive lock in order to alter the table's
schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
lock requests queue up behind it.  We could let the non-exclusive lock
requests go ahead of the ALTER, but that would create a severe risk of the
ALTER *never* getting to run.

I'd kill the ALTER and figure on trying again after the vacuum is done.

Also you might want to look into how you got into a situation where
you have an anti-wraparound vacuum that's taking so long to run.
You didn't do something silly like disable autovacuum did you?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-16 Thread Adrian Klaver

On 02/16/2017 08:45 AM, Tim Bellis wrote:

Thank you all - that's really useful :-)

The other query that gets blocked behind the vacuum is the below (truncated).

This query is generated by jdbc in this method:
org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023)


Is JDBC doing anything else before issuing this?



Even though this is a read only query, is it also expected to be blocked behind 
the vacuum? Is there a way of getting indexes for a table which won't be 
blocked behind a vacuum?


Table 13.2 here:

https://www.postgresql.org/docs/9.5/static/explicit-locking.html

shows the conflicts with SHARE UPDATE EXCLUSIVE(vacuum).

pg_locks:

https://www.postgresql.org/docs/9.5/static/view-pg-locks.html

shows locks being held. So next time it happens I would take a look and 
see if you can work backwards from there.


You could directly access the index information using:

https://www.postgresql.org/docs/9.6/static/catalog-pg-index.html
https://www.postgresql.org/docs/9.6/static/catalog-pg-class.html



Thank you all again,

Tim

SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM,   ct.relname AS TABLE_NAME, NOT 
i.indisunique AS NON_UNIQUE,   NULL AS INDEX_QUALIFIER, ci.relname AS INDEX_NAME,   
CASE i.indisclustered WHEN true THEN 1ELSE CASE am.amname   WHEN 'hash' 
THEN 2  ELSE 3END   END AS TYPE,   (i.keys).n AS ORDINAL_POSITION,   
pg_catalog.pg_get_indexdef(ci.oid, (i.keys).n, false) AS COLUMN_NAME,   CASE 
am.amcanorder WHEN true THEN CASE i.indoption[(i.keys).n - 1] & 1   
WHEN 1 THEN 'D'   ELSE 'A' END ELSE NULL   END AS ASC_OR_DESC,   
ci.reltuples AS CARDINALITY,   ci.relpages AS PAGES,   
pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION FROM 
pg_catalog.pg_class ct   JOIN pg_catalog.pg_namespace n ON (ct.relnamespace = 
n.oid)   JOIN (SELECT i.indexrelid, i.indrelid, i.indoption,   
i.indisunique, i.indisclustered, i.indpred,   i.indexprs,   
information_schema._pg_expandarray(i.indkey) AS keys FROM 
pg_catalog.pg_index i) i ON (ct.oid = i.ind


This query is cut off so cannot say whether it is the issue or not.






--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-16 Thread Tim Bellis
Thank you all - that's really useful :-)

The other query that gets blocked behind the vacuum is the below (truncated).

This query is generated by jdbc in this method:
org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023)

Even though this is a read only query, is it also expected to be blocked behind 
the vacuum? Is there a way of getting indexes for a table which won't be 
blocked behind a vacuum?

Thank you all again,

Tim

SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM,   ct.relname AS TABLE_NAME, 
NOT i.indisunique AS NON_UNIQUE,   NULL AS INDEX_QUALIFIER, ci.relname AS 
INDEX_NAME,   CASE i.indisclustered WHEN true THEN 1ELSE CASE am.amname 
  WHEN 'hash' THEN 2  ELSE 3END   END AS TYPE,   (i.keys).n AS 
ORDINAL_POSITION,   pg_catalog.pg_get_indexdef(ci.oid, (i.keys).n, false) AS 
COLUMN_NAME,   CASE am.amcanorder WHEN true THEN CASE 
i.indoption[(i.keys).n - 1] & 1   WHEN 1 THEN 'D'   ELSE 'A' END
 ELSE NULL   END AS ASC_OR_DESC,   ci.reltuples AS CARDINALITY,   ci.relpages 
AS PAGES,   pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION 
FROM pg_catalog.pg_class ct   JOIN pg_catalog.pg_namespace n ON 
(ct.relnamespace = n.oid)   JOIN (SELECT i.indexrelid, i.indrelid, i.indoption, 
  i.indisunique, i.indisclustered, i.indpred,   i.indexprs, 
  information_schema._pg_expandarray(i.indkey) AS keys FROM 
pg_catalog.pg_index i) i ON (ct.oid = i.ind


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-15 Thread Alvaro Herrera
Scott Marlowe wrote:

> Then a ddl gets in line. It has to wait on the vacuum, and the vacuum,
> set to run super slow. And everybody waits. On vacuum.

Note that this is normally not seen, because autovacuum cancels itself
when somebody is blocked behind it -- until the table reaches the
freeze_max_age limit, and then autovacuum is a for-wraparound one that
is no longer terminated, and then everybody has to wait on it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-15 Thread Scott Marlowe
On Wed, Feb 15, 2017 at 3:26 PM, Scott Marlowe  wrote:
> On Wed, Feb 15, 2017 at 10:30 AM, Tim Bellis  
> wrote:
>> I have a postgres 9.3.4 database table which (intermittently but reliably) 
>> gets into a state where queries get blocked indefinitely (at least for many 
>> hours) behind an automatic vacuum. I was under the impression that vacuum 
>> should never take any blocking locks for any significant period of time, and 
>> so would like help resolving the issue.
>>
>> The process blocking the query is:
>> postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum 
>> worker process   
>> which is running the query
>> autovacuum: VACUUM public.
>>
>> The query being blocked is:
>> ALTER TABLE  ALTER COLUMN  DROP DEFAULT
>> (But I have seen this previously with other queries being blocked. I used 
>> the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine 
>> which queries were blocked)
>
> There are also ways of making the table less likely / not likely /
> will not get vacuum automatically. If you're willing to schedule ddl
> and vacuum on your own you can then mix the two in relative safety.

Followup: 
https://www.postgresql.org/docs/9.3/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-15 Thread Scott Marlowe
On Wed, Feb 15, 2017 at 10:30 AM, Tim Bellis  wrote:
> I have a postgres 9.3.4 database table which (intermittently but reliably) 
> gets into a state where queries get blocked indefinitely (at least for many 
> hours) behind an automatic vacuum. I was under the impression that vacuum 
> should never take any blocking locks for any significant period of time, and 
> so would like help resolving the issue.
>
> The process blocking the query is:
> postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker 
> process   
> which is running the query
> autovacuum: VACUUM public.
>
> The query being blocked is:
> ALTER TABLE  ALTER COLUMN  DROP DEFAULT
> (But I have seen this previously with other queries being blocked. I used the 
> SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which 
> queries were blocked)

Yup, there's a priority inversion in DDL, DML and maintenance
(vacuum). Vacuum runs slow in the background. Normal
update/delete/insert work fine because of the type of lock vacuum has.

Then a ddl gets in line. It has to wait on the vacuum, and the vacuum,
set to run super slow. And everybody waits. On vacuum.

Basically it's bad practice to alter tables that are big and being
worked on, because one way or another you're going to pay a price.

I've used partitions for logging and auditing that autocreate and drop
and vacuum, but they never get ddl done on them when they're getting
updated and vice versa.

There are also ways of making the table less likely / not likely /
will not get vacuum automatically. If you're willing to schedule ddl
and vacuum on your own you can then mix the two in relative safety.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-15 Thread Adrian Klaver

On 02/15/2017 09:30 AM, Tim Bellis wrote:

I have a postgres 9.3.4 database table which (intermittently but reliably) gets 
into a state where queries get blocked indefinitely (at least for many hours) 
behind an automatic vacuum. I was under the impression that vacuum should never 
take any blocking locks for any significant period of time, and so would like 
help resolving the issue.

The process blocking the query is:
postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker process 
  
which is running the query
autovacuum: VACUUM public.

The query being blocked is:
ALTER TABLE  ALTER COLUMN  DROP DEFAULT
(But I have seen this previously with other queries being blocked. I used the 
SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which 
queries were blocked)



Other ALTER TABLE queries?

If so I believe this might apply:

https://www.postgresql.org/docs/9.5/static/explicit-locking.html

SHARE UPDATE EXCLUSIVE

Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW 
EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode 
protects a table against concurrent schema changes and VACUUM runs.


Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX 
CONCURRENTLY, and ALTER TABLE VALIDATE and other ALTER TABLE variants 
(for full details see ALTER TABLE).



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-15 Thread Tim Bellis
I have a postgres 9.3.4 database table which (intermittently but reliably) gets 
into a state where queries get blocked indefinitely (at least for many hours) 
behind an automatic vacuum. I was under the impression that vacuum should never 
take any blocking locks for any significant period of time, and so would like 
help resolving the issue.

The process blocking the query is:
postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker 
process   
which is running the query
autovacuum: VACUUM public.

The query being blocked is:
ALTER TABLE  ALTER COLUMN  DROP DEFAULT
(But I have seen this previously with other queries being blocked. I used the 
SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which 
queries were blocked)

Notes:
  - This database table is used for about 6 million row writes per day, all of 
which are then deleted at the end of the day.
  - We have a reasonable number of systems running identical databases and near 
identical load profiles, and this problem has only been seen on two of those 
systems. It occurs intermittently but reliably (once every few days). It 
persists until postgres is restarted (usually accomplished by a reboot) or the 
autovacuum thread is forcibly killed. Although it often happens again after a 
short period of time when the autovacuum worker process starts.
  - I don't have data on whether this vacuum ever finishes - it certainly takes 
longer than a few hours.
  - Our application kicks off a manual vacuum against this table each night 
which doesn't hit this problem, as far as we're aware.
  - I have sometimes seen this behaviour when there was database corruption. In 
that case I see logs like the following. But it also occurs when there are no 
logs indicating database corruption (which is the purpose of this email)
LOG:  no left sibling (concurrent deletion?) in ""
ERROR:  right sibling's left-link doesn't match: block 41 links to 127 
instead of expected 79 in index ""

Things that I have tried:
  - Printing out backtrace a few times from connecting to that process with gdb 
(see below)
- This indicates that the vacuum process isn't stuck, but I don't know what 
else it says
  - Reading the FAQ and release notes of other 9.3.x versions to see whether 
this looks like a known issue. I couldn't see anything obvious apart from an 
issue about  files (which didn't apply to my system - there was SQL in the 
release note to check).
  - Forcibly killing the autovacuum thread (obviously not a good solution to 
the problem!). This causes the ALTER TABLE query to complete correctly.

Questions:
  - Is this expected? I was under the impression that vacuum should never take 
any blocking locks for any significant period of time.
  - Should I report this as a bug?
  - Should I change some config options to avoid this? (I know turning off 
autovacuum would avoid it, but that's not good practice) I've put the vacuum 
settings for my database below.
  - Should I avoid certain queries which autovacuum is happening?
  - Are there more diags that I should gather to help diagnose this issue?

Thanks!

Tim

--System details--
Postgres 9.3.4 running on Linux x64 VMWare virtual machine. 24 CPUs of 2.7 GHz. 
48GB RAM
It is also running another application (so the database is not the only thing 
on the server).


---Backtrace (from gdb) ---

#0  0x007328c3 in hash_search_with_hash_value ()
#1  0x006369ba in BufTableLookup ()
#2  0x00639192 in ?? ()
#3  0x00639ade in ReadBufferExtended ()
#4  0x004906d9 in _bt_getbuf ()
#5  0x004910e1 in _bt_pagedel ()
#6  0x004924d1 in ?? ()
#7  0x004926da in ?? ()
#8  0x004928fa in btbulkdelete ()
#9  0x0072cdbd in FunctionCall4Coll ()
#10 0x0048cf58 in index_bulk_delete ()
#11 0x0057bb85 in ?? ()
#12 0x0057cfe9 in lazy_vacuum_rel ()
#13 0x0057af56 in ?? ()
#14 0x0057b28c in vacuum ()
#15 0x0060c8fa in ?? ()
#16 0x0060cd96 in ?? ()
#17 0x0060ce66 in StartAutoVacWorker ()
#18 0x00617602 in ?? ()
#19 
#20 0x7f9210c9d393 in __select_nocancel () from /lib64/libc.so.6
#21 0x00618add in PostmasterMain ()
#22 0x005b58d0 in main ()

#0  0x00491107 in _bt_pagedel ()
#1  0x004924d1 in ?? ()
#2  0x004926da in ?? ()
#3  0x004928fa in btbulkdelete ()
...

#0  0x0047a1ef in hash_any ()
#1  0x007336e9 in tag_hash ()
#2  0x0063916c in ?? ()
#3  0x00639ade in ReadBufferExtended ()
#4  0x004906d9 in _bt_getbuf ()
#5  0x004910e1 in _bt_pagedel ()
#6  0x004924d1 in ?? ()
#7  0x004926da in ?? ()
#8  0x004928fa in btbulkdelete ()
...

---Vacuum settings in postgresql.conf---

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 10  # 0-100 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits