Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Thu, Feb 23, 2017 at 2:42 AM, Bill Moranwrote: > On Wed, 22 Feb 2017 13:19:11 -0800 > Jeff Janes wrote: > > > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure > wrote: > > > > > > On Thursday, February 16, 2017, Tom Lane wrote: > > > > > >> Tim Bellis writes: > > >> > Even though this is a read only query, is it also expected to be > > >> blocked behind the vacuum? Is there a way of getting indexes for a > table > > >> which won't be blocked behind a vacuum? > > >> > > >> It's not the vacuum that's blocking your read-only queries. It's the > > >> ALTER TABLE, which needs an exclusive lock in order to alter the > table's > > >> schema. The ALTER is queued waiting for the vacuum to finish, and > lesser > > >> lock requests queue up behind it. We could let the non-exclusive lock > > >> requests go ahead of the ALTER, but that would create a severe risk > of the > > >> ALTER *never* getting to run. > > >> > > >> I'd kill the ALTER and figure on trying again after the vacuum is > done. > > >> > > >> > > > I've been drilled by this and similar lock stacking issues enough > times to > > > make me near 100% sure deferring the ALTER would be the better choice > > > > > > > > This seems like a rather one-sided observation. How could you know how > > often the unimplemented behavior also would have "drilled" you, since it > is > > unimplemented? > > > > There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow > other > > requestors jump the queue if they were compatible with the held lock. If > > that is implemented, then you would just manually lock the table > deferably > > before invoking the ALTER TABLE command, if that is the behavior you > wanted > > (but it wouldn't work for things that can't be run in transactions) > > This seems redundant to me. > > We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade > script that uses LOCK to explicitly lock tables that it's going to ALTER, > then busy-waits if the lock is not immediately grantable. > As fairly trivial as it is, I bet I would mess it up a few times before I got it right. And then it would probably still be wrong in corner cases. What if it fails not because the lock is unavailable, but for some obscure error other than the ones anticipated or encountered during testing? And busy-waiting is generally nasty and a waste of resources. > > The fact that so many ORMs and similar tools don't take advantage of that > functionality is rather depressing. > > In my experience, I've also seen heavily loaded systems that this wouldn't > work on, essentially because there is _always_ _some_ lock on every table. > This is a case where experienced developers are required to take some > extra time to coordinate their upgrades to work around the high load. But > the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because > the ALTER would be deferred indefinitely. > True. That is why it would be available only upon request, not the new default. > Personally, I feel like the existing behavior is preferrable. Software > teams need to take the time to understand the locking implications of their > actions or they'll have nothing but trouble anyway. > > As I've seen time and again: writing an application that handles low load > and low concurrency is fairly trivial, but scaling that app up to high > load and/or high concurrency generally sorts out the truely brilliant > developers from the merely average. > So why not give the merely average some better tools? Cheers, Jeff
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, Feb 22, 2017 at 3:19 PM, Jeff Janeswrote: > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure wrote: >> On Thursday, February 16, 2017, Tom Lane wrote: >>> >>> Tim Bellis writes: >>> > Even though this is a read only query, is it also expected to be >>> > blocked behind the vacuum? Is there a way of getting indexes for a table >>> > which won't be blocked behind a vacuum? >>> >>> It's not the vacuum that's blocking your read-only queries. It's the >>> ALTER TABLE, which needs an exclusive lock in order to alter the table's >>> schema. The ALTER is queued waiting for the vacuum to finish, and lesser >>> lock requests queue up behind it. We could let the non-exclusive lock >>> requests go ahead of the ALTER, but that would create a severe risk of >>> the >>> ALTER *never* getting to run. >>> >>> I'd kill the ALTER and figure on trying again after the vacuum is done. >> >> I've been drilled by this and similar lock stacking issues enough times to >> make me near 100% sure deferring the ALTER would be the better choice > > This seems like a rather one-sided observation. How could you know how > often the unimplemented behavior also would have "drilled" you, since it is > unimplemented? Well, that I can't really say, but at least in my case ALTER TABLE in the face of concurrent application activity can wait but locking tables for reading for an indefinite period will rapidly destabilize the system. An example of this usage is replacing partitions on a warehouse table. About half of my P1s over the last 12 months ago are relating to locking problem of some kind. So I end up during workarounds such as issuing "LOCK...NOWAIT" in a sleep loop :( or application restructuring, especially minimizing use of TRUNCATE. I do think instrumentation around locking behaviors would be helpful. Allowing (optionally) waiters to leapfrog in if they can clear would be wonderful as would being able to specify maximum wait timeouts inside a transaction. FWIW, I'm not sure this behavior makes sense attached to LOCK, I'd rather see them attached generally to SET TRANSACTION -- my 0.02$ (talk is cheap, etc). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, 22 Feb 2017 13:19:11 -0800 Jeff Janeswrote: > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure wrote: > > > > On Thursday, February 16, 2017, Tom Lane wrote: > > > >> Tim Bellis writes: > >> > Even though this is a read only query, is it also expected to be > >> blocked behind the vacuum? Is there a way of getting indexes for a table > >> which won't be blocked behind a vacuum? > >> > >> It's not the vacuum that's blocking your read-only queries. It's the > >> ALTER TABLE, which needs an exclusive lock in order to alter the table's > >> schema. The ALTER is queued waiting for the vacuum to finish, and lesser > >> lock requests queue up behind it. We could let the non-exclusive lock > >> requests go ahead of the ALTER, but that would create a severe risk of the > >> ALTER *never* getting to run. > >> > >> I'd kill the ALTER and figure on trying again after the vacuum is done. > >> > >> > > I've been drilled by this and similar lock stacking issues enough times to > > make me near 100% sure deferring the ALTER would be the better choice > > > > > This seems like a rather one-sided observation. How could you know how > often the unimplemented behavior also would have "drilled" you, since it is > unimplemented? > > There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other > requestors jump the queue if they were compatible with the held lock. If > that is implemented, then you would just manually lock the table deferably > before invoking the ALTER TABLE command, if that is the behavior you wanted > (but it wouldn't work for things that can't be run in transactions) This seems redundant to me. We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade script that uses LOCK to explicitly lock tables that it's going to ALTER, then busy-waits if the lock is not immediately grantable. The fact that so many ORMs and similar tools don't take advantage of that functionality is rather depressing. In my experience, I've also seen heavily loaded systems that this wouldn't work on, essentially because there is _always_ _some_ lock on every table. This is a case where experienced developers are required to take some extra time to coordinate their upgrades to work around the high load. But the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because the ALTER would be deferred indefinitely. Personally, I feel like the existing behavior is preferrable. Software teams need to take the time to understand the locking implications of their actions or they'll have nothing but trouble anyway. As I've seen time and again: writing an application that handles low load and low concurrency is fairly trivial, but scaling that app up to high load and/or high concurrency generally sorts out the truely brilliant developers from the merely average. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Fri, Feb 17, 2017 at 10:36 AM, Tim Bellis <tim.bel...@metaswitch.com> wrote: > > > > > *From:* Jeff Janes [mailto:jeff.ja...@gmail.com] > *Sent:* 17 February 2017 02:59 > *To:* Tim Bellis <tim.bel...@metaswitch.com> > *Cc:* pgsql-general@postgresql.org > *Subject:* Re: [GENERAL] Autovacuum stuck for hours, blocking queries > > > > On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <tim.bel...@metaswitch.com> > wrote: > > I have a postgres 9.3.4 database table which (intermittently but reliably) > gets into a state where queries get blocked indefinitely (at least for many > hours) behind an automatic vacuum. I was under the impression that vacuum > should never take any blocking locks for any significant period of time, > and so would like help resolving the issue. > > The process blocking the query is: > postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum > worker process > which is running the query > autovacuum: VACUUM public. > > > > Are you sure it doesn't really say: > > > > autovacuum: VACUUM public. (to prevent wraparound) > > *[Tim Bellis] It doesn’t. I was using the query from * > *https://wiki.postgresql.org/wiki/Lock_Monitoring* > <https://wiki.postgresql.org/wiki/Lock_Monitoring>* and looking at the > ‘current_statement_in_blocking_process’ column. Is there a different query > I should be using?* > That query seems to be a bit mangled. At one time, it only found row-level locks. Someone changed that, but didn't remove the comment "these only find row-level locks, not object-level locks" Also, the "WHERE NOT blocked_locks.GRANTED" should perhaps be: WHERE NOT blocked_locks.GRANTED and blocking_locks.GRANTED; As it is, every waiting query reports that it is waiting on all of its fellow victims as well as the thing(s) actually blocking it. But my WHERE clause is not really correct either, as it is possible that it is one blocked thing is being blocked by a different blocked thing which is ahead of it in the queue, when without that intervening blocked requestor it could be immediately granted if its request mode is compatible with the held mode(s). I don't think there is a query that can reveal what is most immediately blocking it. But, I don't see how this explains what you see. An autovacuum without "(to prevent wraparound)" should not block anything for much more than a second (unless you changed deadlock_timeout) and should not be blocked by anything either as it just gives up on the operation if the lock is not immediately available. Cheers, Jeff
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncurewrote: > > > On Thursday, February 16, 2017, Tom Lane wrote: > >> Tim Bellis writes: >> > Even though this is a read only query, is it also expected to be >> blocked behind the vacuum? Is there a way of getting indexes for a table >> which won't be blocked behind a vacuum? >> >> It's not the vacuum that's blocking your read-only queries. It's the >> ALTER TABLE, which needs an exclusive lock in order to alter the table's >> schema. The ALTER is queued waiting for the vacuum to finish, and lesser >> lock requests queue up behind it. We could let the non-exclusive lock >> requests go ahead of the ALTER, but that would create a severe risk of the >> ALTER *never* getting to run. >> >> I'd kill the ALTER and figure on trying again after the vacuum is done. >> >> > I've been drilled by this and similar lock stacking issues enough times to > make me near 100% sure deferring the ALTER would be the better choice > > This seems like a rather one-sided observation. How could you know how often the unimplemented behavior also would have "drilled" you, since it is unimplemented? There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other requestors jump the queue if they were compatible with the held lock. If that is implemented, then you would just manually lock the table deferably before invoking the ALTER TABLE command, if that is the behavior you wanted (but it wouldn't work for things that can't be run in transactions) Ideally each requestor would specify if they will hold the lock for a long timer or a short time. Them a short requestor which is blocked behind a long requestor could let other compatible-with-held requests jump over it. But once it was only blocked by short locks, it would reassert the normal order, so it can't get permanently blocked by a constantly overlapping stream of short locks. But how would you get all lock requestors to provide a reasonable estimate? Cheers, Jeff
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
-Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: 16 February 2017 22:40 To: Tim Bellis <tim.bel...@metaswitch.com> Cc: Adrian Klaver <adrian.kla...@aklaver.com>; pgsql-general@postgresql.org; Alvaro Herrera <alvhe...@2ndquadrant.com>; Scott Marlowe <scott.marl...@gmail.com> Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries Tim Bellis <tim.bel...@metaswitch.com> writes: > > Even though this is a read only query, is it also expected to be blocked > > behind the vacuum? Is there a way of getting indexes for a table which > > won't be blocked behind a vacuum? > It's not the vacuum that's blocking your read-only queries. It's the ALTER > TABLE, which needs an exclusive lock in order to alter the table's schema. > The ALTER is queued waiting for the vacuum to finish, and lesser lock > requests queue up behind it. We could let the non-exclusive lock requests go > ahead of the ALTER, but that would create a severe risk of the ALTER *never* > getting to run. The lock monitoring query (https://wiki.postgresql.org/wiki/Lock_Monitoring) said that the blocking_pid and the current_statement_in_blocking_process for the queries reading the index data was the autovacuum, not the ALTER. Am I reading the output wrong? Does it not correctly represent the chain of locks? > I'd kill the ALTER and figure on trying again after the vacuum is done. > Also you might want to look into how you got into a situation where you have > an anti-wraparound vacuum that's taking so long to run. > You didn't do something silly like disable autovacuum did you? No, autovacuum is on (and this is an autovacuum which is in progress). But I will look at why I'm getting a blocking autovacuum. > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Thursday, February 16, 2017, Tom Lanewrote: > Tim Bellis > writes: > > Even though this is a read only query, is it also expected to be blocked > behind the vacuum? Is there a way of getting indexes for a table which > won't be blocked behind a vacuum? > > It's not the vacuum that's blocking your read-only queries. It's the > ALTER TABLE, which needs an exclusive lock in order to alter the table's > schema. The ALTER is queued waiting for the vacuum to finish, and lesser > lock requests queue up behind it. We could let the non-exclusive lock > requests go ahead of the ALTER, but that would create a severe risk of the > ALTER *never* getting to run. > > I'd kill the ALTER and figure on trying again after the vacuum is done. > > I've been drilled by this and similar lock stacking issues enough times to make me near 100% sure deferring the ALTER would be the better choice merlin
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On 02/17/2017 11:54 PM, Michael Paquier wrote: On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowewrote: Yes it can. Truncate has been rollbackable for a while now. Per the docs: https://www.postgresql.org/docs/9.6/static/sql-truncate.html "TRUNCATE is transaction-safe with respect to the data in the tables: the truncation will be safely rolled back if the surrounding transaction does not commit." In short yes a transaction doing a truncate can be rollbacked. I think the part that confuses people into thinking it can not be rollbacked is this: "TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the truncation occurred. See Section 13.5 for more details." -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowewrote: > Yes it can. Truncate has been rollbackable for a while now. Per the docs: https://www.postgresql.org/docs/9.6/static/sql-truncate.html "TRUNCATE is transaction-safe with respect to the data in the tables: the truncation will be safely rolled back if the surrounding transaction does not commit." In short yes a transaction doing a truncate can be rollbacked. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Fri, Feb 17, 2017 at 1:38 PM, Rakesh Kumarwrote: > LOCK TABLE yourtable ; > CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ; > TRUNCATE yourtable; > INSERT INTO yourtable SELECT * from keep; > COMMIT; > === > the above snippet assumes truncate in PG can be in a transaction. In other > words, while truncate by itself > is atomic, it can't be rolled back. So in the above case, if "INSERT INTO > yourtable SELECT * from keep;" and > we rollback, will it rollback yourtable. Yes it can. Truncate has been rollbackable for a while now. begin; create table insert into table truncate old table . something goes wrong . rollback; Unless I misunderstand your meaning. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
LOCK TABLE yourtable ; CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ; TRUNCATE yourtable; INSERT INTO yourtable SELECT * from keep; COMMIT; === the above snippet assumes truncate in PG can be in a transaction. In other words, while truncate by itself is atomic, it can't be rolled back. So in the above case, if "INSERT INTO yourtable SELECT * from keep;" and we rollback, will it rollback yourtable. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Hi Tim, Am 2017-02-17 um 17:02 schrieb Tim Bellis: The DELETE operations only deletes rows from the > previous day. It's possible that there have been rows > added that day which ought not to be deleted, so > TRUNCATE wouldn't work. OK, then I'll try two other suggestions: - use table partitioning ( https://www.postgresql.org/docs/9.3/static/ddl-partitioning.html ) - if the number of rows you need to keep is small, you could try something like this: LOCK TABLE yourtable ; CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE ; TRUNCATE yourtable; INSERT INTO yourtable SELECT * from keep; COMMIT; Best regards, -hannes -Original Message- From: Hannes Erven [mailto:han...@erven.at] Sent: 17 February 2017 11:47 To: pgsql-general@postgresql.org Cc: Tim Bellis <tim.bel...@metaswitch.com> Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries Hi Tim, Am 2017-02-15 um 18:30 schrieb Tim Bellis: I have a postgres 9.3.4 database table which (intermittently but reliably) > gets into a state where queries get blocked indefinitely > [..] Notes: - This database table is used for about 6 million row writes per day, > all of which are then deleted at the end of the day. If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreates the table data files, requiring nearly zero IO and analyzing. Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come up with this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so the table's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually release the unused disk space, so this may or may not match the current behaviour. Best regards, -hannes -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
From: Jeff Janes [mailto:jeff.ja...@gmail.com] Sent: 17 February 2017 02:59 To: Tim Bellis <tim.bel...@metaswitch.com> Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <tim.bel...@metaswitch.com<mailto:tim.bel...@metaswitch.com>> wrote: I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue. The process blocking the query is: postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker process which is running the query autovacuum: VACUUM public. Are you sure it doesn't really say: autovacuum: VACUUM public. (to prevent wraparound) [Tim Bellis] It doesn’t. I was using the query from https://wiki.postgresql.org/wiki/Lock_Monitoring and looking at the ‘current_statement_in_blocking_process’ column. Is there a different query I should be using? If it doesn't include the "to prevent wraparound", then it should sacrifice itself as soon as it realizes it is blocking something else. If it is not doing that, something is wrong. If it does say "(to prevent wraparound)", then see all the other comments on this thread. Notes: - This database table is used for about 6 million row writes per day, all of which are then deleted at the end of the day. How many transactions do those 6 million writes comprise? [Tim Bellis] I’m not sure - I think it’s 6 million transactions, but there might be some batching going on that I’m unaware of. What would this affect? (I can dig in if necessary) (I might have been slightly wrong in characterising the exact behaviour; the table might be cleared every hour rather than every day, but there are still 6 million writes per day) - Our application kicks off a manual vacuum against this table each night which doesn't hit this problem, as far as we're aware. If it were having problems, would you be aware of it? Do you see in the log files the completion of the vacuum? Or look in pg_stat_user_tables to see when last_vacuum was. If it runs every night and succeeds, it is hard to see why wraparound would ever kick in. Unless you are hitting 150,000,000 transactions in a day. [Tim Bellis] I shall investigate this. Cheers, Jeff
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
The DELETE operations only deletes rows from the previous day. It's possible that there have been rows added that day which ought not to be deleted, so TRUNCATE wouldn't work. But that was a helpful suggestion - thanks! Tim -Original Message- From: Hannes Erven [mailto:han...@erven.at] Sent: 17 February 2017 11:47 To: pgsql-general@postgresql.org Cc: Tim Bellis <tim.bel...@metaswitch.com> Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries Hi Tim, Am 2017-02-15 um 18:30 schrieb Tim Bellis: > I have a postgres 9.3.4 database table which (intermittently but > reliably) > gets into a state where queries get blocked indefinitely > [..] > Notes: > - This database table is used for about 6 million row writes per > day, > all of which are then deleted at the end of the day. If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreates the table data files, requiring nearly zero IO and analyzing. Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come up with this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so the table's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually release the unused disk space, so this may or may not match the current behaviour. Best regards, -hannes -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Hi Tim, Am 2017-02-15 um 18:30 schrieb Tim Bellis: I have a postgres 9.3.4 database table which (intermittently but reliably) > gets into a state where queries get blocked indefinitely > [..] Notes: - This database table is used for about 6 million row writes per day, > all of which are then deleted at the end of the day. If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreates the table data files, requiring nearly zero IO and analyzing. Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come up with this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so the table's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually release the unused disk space, so this may or may not match the current behaviour. Best regards, -hannes -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Tom Lane wrote: > Also you might want to look into how you got into a situation where > you have an anti-wraparound vacuum that's taking so long to run. If there are ALTERs running all the time, regular (non-anti-wraparound) vacuums would be canceled and never get a chance to run. Eventually, autovacuum decides it's had enough and doesn't cancel anymore, so everyone else gets stuck behind. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Tim Belliswrites: > Even though this is a read only query, is it also expected to be blocked > behind the vacuum? Is there a way of getting indexes for a table which won't > be blocked behind a vacuum? It's not the vacuum that's blocking your read-only queries. It's the ALTER TABLE, which needs an exclusive lock in order to alter the table's schema. The ALTER is queued waiting for the vacuum to finish, and lesser lock requests queue up behind it. We could let the non-exclusive lock requests go ahead of the ALTER, but that would create a severe risk of the ALTER *never* getting to run. I'd kill the ALTER and figure on trying again after the vacuum is done. Also you might want to look into how you got into a situation where you have an anti-wraparound vacuum that's taking so long to run. You didn't do something silly like disable autovacuum did you? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On 02/16/2017 08:45 AM, Tim Bellis wrote: Thank you all - that's really useful :-) The other query that gets blocked behind the vacuum is the below (truncated). This query is generated by jdbc in this method: org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023) Is JDBC doing anything else before issuing this? Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexes for a table which won't be blocked behind a vacuum? Table 13.2 here: https://www.postgresql.org/docs/9.5/static/explicit-locking.html shows the conflicts with SHARE UPDATE EXCLUSIVE(vacuum). pg_locks: https://www.postgresql.org/docs/9.5/static/view-pg-locks.html shows locks being held. So next time it happens I would take a look and see if you can work backwards from there. You could directly access the index information using: https://www.postgresql.org/docs/9.6/static/catalog-pg-index.html https://www.postgresql.org/docs/9.6/static/catalog-pg-class.html Thank you all again, Tim SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, ct.relname AS TABLE_NAME, NOT i.indisunique AS NON_UNIQUE, NULL AS INDEX_QUALIFIER, ci.relname AS INDEX_NAME, CASE i.indisclustered WHEN true THEN 1ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3END END AS TYPE, (i.keys).n AS ORDINAL_POSITION, pg_catalog.pg_get_indexdef(ci.oid, (i.keys).n, false) AS COLUMN_NAME, CASE am.amcanorder WHEN true THEN CASE i.indoption[(i.keys).n - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS ASC_OR_DESC, ci.reltuples AS CARDINALITY, ci.relpages AS PAGES, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION FROM pg_catalog.pg_class ct JOIN pg_catalog.pg_namespace n ON (ct.relnamespace = n.oid) JOIN (SELECT i.indexrelid, i.indrelid, i.indoption, i.indisunique, i.indisclustered, i.indpred, i.indexprs, information_schema._pg_expandarray(i.indkey) AS keys FROM pg_catalog.pg_index i) i ON (ct.oid = i.ind This query is cut off so cannot say whether it is the issue or not. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Thank you all - that's really useful :-) The other query that gets blocked behind the vacuum is the below (truncated). This query is generated by jdbc in this method: org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023) Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexes for a table which won't be blocked behind a vacuum? Thank you all again, Tim SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, ct.relname AS TABLE_NAME, NOT i.indisunique AS NON_UNIQUE, NULL AS INDEX_QUALIFIER, ci.relname AS INDEX_NAME, CASE i.indisclustered WHEN true THEN 1ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3END END AS TYPE, (i.keys).n AS ORDINAL_POSITION, pg_catalog.pg_get_indexdef(ci.oid, (i.keys).n, false) AS COLUMN_NAME, CASE am.amcanorder WHEN true THEN CASE i.indoption[(i.keys).n - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS ASC_OR_DESC, ci.reltuples AS CARDINALITY, ci.relpages AS PAGES, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION FROM pg_catalog.pg_class ct JOIN pg_catalog.pg_namespace n ON (ct.relnamespace = n.oid) JOIN (SELECT i.indexrelid, i.indrelid, i.indoption, i.indisunique, i.indisclustered, i.indpred, i.indexprs, information_schema._pg_expandarray(i.indkey) AS keys FROM pg_catalog.pg_index i) i ON (ct.oid = i.ind -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Scott Marlowe wrote: > Then a ddl gets in line. It has to wait on the vacuum, and the vacuum, > set to run super slow. And everybody waits. On vacuum. Note that this is normally not seen, because autovacuum cancels itself when somebody is blocked behind it -- until the table reaches the freeze_max_age limit, and then autovacuum is a for-wraparound one that is no longer terminated, and then everybody has to wait on it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, Feb 15, 2017 at 3:26 PM, Scott Marlowewrote: > On Wed, Feb 15, 2017 at 10:30 AM, Tim Bellis > wrote: >> I have a postgres 9.3.4 database table which (intermittently but reliably) >> gets into a state where queries get blocked indefinitely (at least for many >> hours) behind an automatic vacuum. I was under the impression that vacuum >> should never take any blocking locks for any significant period of time, and >> so would like help resolving the issue. >> >> The process blocking the query is: >> postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum >> worker process >> which is running the query >> autovacuum: VACUUM public. >> >> The query being blocked is: >> ALTER TABLE ALTER COLUMN DROP DEFAULT >> (But I have seen this previously with other queries being blocked. I used >> the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine >> which queries were blocked) > > There are also ways of making the table less likely / not likely / > will not get vacuum automatically. If you're willing to schedule ddl > and vacuum on your own you can then mix the two in relative safety. Followup: https://www.postgresql.org/docs/9.3/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, Feb 15, 2017 at 10:30 AM, Tim Belliswrote: > I have a postgres 9.3.4 database table which (intermittently but reliably) > gets into a state where queries get blocked indefinitely (at least for many > hours) behind an automatic vacuum. I was under the impression that vacuum > should never take any blocking locks for any significant period of time, and > so would like help resolving the issue. > > The process blocking the query is: > postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker > process > which is running the query > autovacuum: VACUUM public. > > The query being blocked is: > ALTER TABLE ALTER COLUMN DROP DEFAULT > (But I have seen this previously with other queries being blocked. I used the > SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which > queries were blocked) Yup, there's a priority inversion in DDL, DML and maintenance (vacuum). Vacuum runs slow in the background. Normal update/delete/insert work fine because of the type of lock vacuum has. Then a ddl gets in line. It has to wait on the vacuum, and the vacuum, set to run super slow. And everybody waits. On vacuum. Basically it's bad practice to alter tables that are big and being worked on, because one way or another you're going to pay a price. I've used partitions for logging and auditing that autocreate and drop and vacuum, but they never get ddl done on them when they're getting updated and vice versa. There are also ways of making the table less likely / not likely / will not get vacuum automatically. If you're willing to schedule ddl and vacuum on your own you can then mix the two in relative safety. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On 02/15/2017 09:30 AM, Tim Bellis wrote: I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue. The process blocking the query is: postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker process which is running the query autovacuum: VACUUM public. The query being blocked is: ALTER TABLE ALTER COLUMN DROP DEFAULT (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which queries were blocked) Other ALTER TABLE queries? If so I believe this might apply: https://www.postgresql.org/docs/9.5/static/explicit-locking.html SHARE UPDATE EXCLUSIVE Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema changes and VACUUM runs. Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, and ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see ALTER TABLE). -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Autovacuum stuck for hours, blocking queries
I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue. The process blocking the query is: postgres 21985 11304 98 Feb13 ?1-14:20:52 postgres: autovacuum worker process which is running the query autovacuum: VACUUM public. The query being blocked is: ALTER TABLE ALTER COLUMN DROP DEFAULT (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoring to determine which queries were blocked) Notes: - This database table is used for about 6 million row writes per day, all of which are then deleted at the end of the day. - We have a reasonable number of systems running identical databases and near identical load profiles, and this problem has only been seen on two of those systems. It occurs intermittently but reliably (once every few days). It persists until postgres is restarted (usually accomplished by a reboot) or the autovacuum thread is forcibly killed. Although it often happens again after a short period of time when the autovacuum worker process starts. - I don't have data on whether this vacuum ever finishes - it certainly takes longer than a few hours. - Our application kicks off a manual vacuum against this table each night which doesn't hit this problem, as far as we're aware. - I have sometimes seen this behaviour when there was database corruption. In that case I see logs like the following. But it also occurs when there are no logs indicating database corruption (which is the purpose of this email) LOG: no left sibling (concurrent deletion?) in "" ERROR: right sibling's left-link doesn't match: block 41 links to 127 instead of expected 79 in index "" Things that I have tried: - Printing out backtrace a few times from connecting to that process with gdb (see below) - This indicates that the vacuum process isn't stuck, but I don't know what else it says - Reading the FAQ and release notes of other 9.3.x versions to see whether this looks like a known issue. I couldn't see anything obvious apart from an issue about files (which didn't apply to my system - there was SQL in the release note to check). - Forcibly killing the autovacuum thread (obviously not a good solution to the problem!). This causes the ALTER TABLE query to complete correctly. Questions: - Is this expected? I was under the impression that vacuum should never take any blocking locks for any significant period of time. - Should I report this as a bug? - Should I change some config options to avoid this? (I know turning off autovacuum would avoid it, but that's not good practice) I've put the vacuum settings for my database below. - Should I avoid certain queries which autovacuum is happening? - Are there more diags that I should gather to help diagnose this issue? Thanks! Tim --System details-- Postgres 9.3.4 running on Linux x64 VMWare virtual machine. 24 CPUs of 2.7 GHz. 48GB RAM It is also running another application (so the database is not the only thing on the server). ---Backtrace (from gdb) --- #0 0x007328c3 in hash_search_with_hash_value () #1 0x006369ba in BufTableLookup () #2 0x00639192 in ?? () #3 0x00639ade in ReadBufferExtended () #4 0x004906d9 in _bt_getbuf () #5 0x004910e1 in _bt_pagedel () #6 0x004924d1 in ?? () #7 0x004926da in ?? () #8 0x004928fa in btbulkdelete () #9 0x0072cdbd in FunctionCall4Coll () #10 0x0048cf58 in index_bulk_delete () #11 0x0057bb85 in ?? () #12 0x0057cfe9 in lazy_vacuum_rel () #13 0x0057af56 in ?? () #14 0x0057b28c in vacuum () #15 0x0060c8fa in ?? () #16 0x0060cd96 in ?? () #17 0x0060ce66 in StartAutoVacWorker () #18 0x00617602 in ?? () #19 #20 0x7f9210c9d393 in __select_nocancel () from /lib64/libc.so.6 #21 0x00618add in PostmasterMain () #22 0x005b58d0 in main () #0 0x00491107 in _bt_pagedel () #1 0x004924d1 in ?? () #2 0x004926da in ?? () #3 0x004928fa in btbulkdelete () ... #0 0x0047a1ef in hash_any () #1 0x007336e9 in tag_hash () #2 0x0063916c in ?? () #3 0x00639ade in ReadBufferExtended () #4 0x004906d9 in _bt_getbuf () #5 0x004910e1 in _bt_pagedel () #6 0x004924d1 in ?? () #7 0x004926da in ?? () #8 0x004928fa in btbulkdelete () ... ---Vacuum settings in postgresql.conf--- # - Cost-Based Vacuum Delay - vacuum_cost_delay = 10 # 0-100 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits