Re: [GENERAL] Avoiding a deadlock

2013-03-12 Thread Albe Laurenz
Paul Jungwirth wrote:
 Out of curiosity: any reason the ORDER BY should be in the subquery? It 
 seems like it ought to be in
 the UPDATE (if that's allowed).
 
 Hmm, it's not allowed. :-) It's still surprising that you can guarantee the 
 order of a multi-row
 UPDATE by ordering a subquery.

To be honest, I don't think that there is any guarantee for this
to work reliably in all comparable cases, as PostgreSQL does
not guarantee in which order it performs the UPDATEs.

It just happens to work with certain plans (use EXPLAIN
to see wat will happen).

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Albe Laurenz
Paul Jungwirth wrote:
 I have a long-running multi-row UPDATE that is deadlocking with a single-row 
 UPDATE:
 
 2013-03-09 11:07:51 CST ERROR:  deadlock detected
 2013-03-09 11:07:51 CST DETAIL:  Process 18851 waits for ShareLock on 
 transaction 10307138; blocked by
 process 24203.
 Process 24203 waits for ShareLock on transaction 10306996; blocked by 
 process 18851.
 Process 18851: UPDATE  taggings tg
 SET score_tier = COALESCE(x.perc, 0)
 FROM(SELECT tg2.id,
 percent_rank() OVER (PARTITION BY tg2.tag_id 
 ORDER BY tg2.score ASC)
 AS perc
  FROM   taggings tg2, tags t
  WHERE  tg2.score IS NOT NULL
  ANDtg2.tag_id = t.id
  ANDt.tier = 2) AS x
 WHERE   tg.id = x.id
 AND tg.score IS NOT NULL
 ;
 Process 24203: UPDATE taggings SET score = 2 WHERE 
 taggings.id = 29105523
 
 Note that these two queries are actually updating different columns, albeit 
 apparently in the same
 row.
 
 Is there anything I can do to avoid a deadlock here? The big query does 
 nothing else in its
 transaction; the little query's transaction might update several rows from 
 `taggings`, which I guess
 is the real reason for the deadlock.
 
 I'd be pretty satisfied with approximate values for the big query. As you can 
 see, it is just taking
 the `score` of each `tagging` and computing the percentage of times it beats 
 other taggings of the
 same tag. Is there something I can do with transaction isolation levels here? 
 I don't care if the big
 query operates on slightly-out-of-date values. Since each query updates 
 different columns, I think
 there should be no issue with them overwriting each other, right?

The problem is that both updates affect the same rows.
It does not matter if they update different columns, since in any
case a new row version is created (read about PostgreSQL's MVCC
implementation in the documentation).

I can only think of two ways to avoid this deadlock:

1) Each of the little transactions modifies no more than one row of the table.

2) All transactions modify table rows in the same order, e.g. ascending id.
   With the big update you can do that by putting an ORDER BY tg2.id into
   the subquery, and with the little transactions you'll have to make sure
   that rows are updated in ascending id order.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Chris Curvey
On Sat, Mar 9, 2013 at 4:20 PM, Paul Jungwirth
p...@illuminatedcomputing.comwrote:

 I have a long-running multi-row UPDATE that is deadlocking with a
 single-row UPDATE:

 2013-03-09 11:07:51 CST ERROR:  deadlock detected
 2013-03-09 11:07:51 CST DETAIL:  Process 18851 waits for ShareLock on
 transaction 10307138; blocked by process 24203.
 Process 24203 waits for ShareLock on transaction 10306996; blocked
 by process 18851.
 Process 18851: UPDATE  taggings tg
 SET score_tier = COALESCE(x.perc, 0)
 FROM(SELECT tg2.id,
 percent_rank() OVER (PARTITION BY
 tg2.tag_id ORDER BY tg2.score ASC) AS perc
  FROM   taggings tg2, tags t
  WHERE  tg2.score IS NOT NULL
  ANDtg2.tag_id = t.id
  ANDt.tier = 2) AS x
 WHERE   tg.id = x.id
 AND tg.score IS NOT NULL
 ;
 Process 24203: UPDATE taggings SET score = 2 WHERE
 taggings.id = 29105523

 Note that these two queries are actually updating different columns,
 albeit apparently in the same row.

 Is there anything I can do to avoid a deadlock here? The big query does
 nothing else in its transaction; the little query's transaction might
 update several rows from `taggings`, which I guess is the real reason for
 the deadlock.

 I'd be pretty satisfied with approximate values for the big query. As you
 can see, it is just taking the `score` of each `tagging` and computing the
 percentage of times it beats other taggings of the same tag. Is there
 something I can do with transaction isolation levels here? I don't care if
 the big query operates on slightly-out-of-date values. Since each query
 updates different columns, I think there should be no issue with them
 overwriting each other, right?

 Thanks,
 Paul


it *might* help to do the calculation work (all those nested SELECTs) and
store the results in a temporary table, then do the update as a second,
simpler join to the temp table.


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Alban Hertroys
On 11 March 2013 13:01, Chris Curvey ch...@chriscurvey.com wrote:

 On Sat, Mar 9, 2013 at 4:20 PM, Paul Jungwirth 
 p...@illuminatedcomputing.com wrote:

 I have a long-running multi-row UPDATE that is deadlocking with a
 single-row UPDATE:

 2013-03-09 11:07:51 CST ERROR:  deadlock detected
 2013-03-09 11:07:51 CST DETAIL:  Process 18851 waits for ShareLock on
 transaction 10307138; blocked by process 24203.
 Process 24203 waits for ShareLock on transaction 10306996;
 blocked by process 18851.
 Process 18851: UPDATE  taggings tg
 SET score_tier = COALESCE(x.perc, 0)
 FROM(SELECT tg2.id,
 percent_rank() OVER (PARTITION BY
 tg2.tag_id ORDER BY tg2.score ASC) AS perc
  FROM   taggings tg2, tags t
  WHERE  tg2.score IS NOT NULL
  ANDtg2.tag_id = t.id
  ANDt.tier = 2) AS x
 WHERE   tg.id = x.id
 AND tg.score IS NOT NULL
 ;
 Process 24203: UPDATE taggings SET score = 2 WHERE
 taggings.id = 29105523

 Note that these two queries are actually updating different columns,
 albeit apparently in the same row.

 Is there anything I can do to avoid a deadlock here? The big query does
 nothing else in its transaction; the little query's transaction might
 update several rows from `taggings`, which I guess is the real reason for
 the deadlock.

 I'd be pretty satisfied with approximate values for the big query. As you
 can see, it is just taking the `score` of each `tagging` and computing the
 percentage of times it beats other taggings of the same tag. Is there
 something I can do with transaction isolation levels here? I don't care if
 the big query operates on slightly-out-of-date values. Since each query
 updates different columns, I think there should be no issue with them
 overwriting each other, right?

 Thanks,
 Paul


 it *might* help to do the calculation work (all those nested SELECTs) and
 store the results in a temporary table, then do the update as a second,
 simpler join to the temp table.


All the suggestions thus far only reduce the window in which a dead lock
can occur.

If you really need to prevent that, you can split off the columns for one
of the two types of updates into a separate table with a foreign key to the
original table.
That way your updates happen in different tables and there's no chance on a
deadlock between the two types of queries.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Albe Laurenz
Alban Hertroys wrote:
 All the suggestions thus far only reduce the window in which a dead lock can 
 occur.

Where do you see a window for deadlocks with my suggestions?

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Paul Jungwirth
 2) All transactions modify table rows in the same order, e.g. ascending
id.
With the big update you can do that by putting an ORDER BY tg2.id
into
the subquery, and with the little transactions you'll have to make
sure
that rows are updated in ascending id order.

I agree this would fix the deadlock. It also seems like the least
disruptive way of fixing the problem.

Out of curiosity: any reason the ORDER BY should be in the subquery? It
seems like it ought to be in the UPDATE (if that's allowed).

Thanks,
Paul


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Paul Jungwirth
 Out of curiosity: any reason the ORDER BY should be in the subquery? It
seems like it ought to be in the UPDATE (if that's allowed).

Hmm, it's not allowed. :-) It's still surprising that you can guarantee the
order of a multi-row UPDATE by ordering a subquery.

Paul

-- 
_
Pulchritudo splendor veritatis.


[GENERAL] Avoiding a deadlock

2013-03-09 Thread Paul Jungwirth
I have a long-running multi-row UPDATE that is deadlocking with a
single-row UPDATE:

2013-03-09 11:07:51 CST ERROR:  deadlock detected
2013-03-09 11:07:51 CST DETAIL:  Process 18851 waits for ShareLock on
transaction 10307138; blocked by process 24203.
Process 24203 waits for ShareLock on transaction 10306996; blocked
by process 18851.
Process 18851: UPDATE  taggings tg
SET score_tier = COALESCE(x.perc, 0)
FROM(SELECT tg2.id,
percent_rank() OVER (PARTITION BY
tg2.tag_id ORDER BY tg2.score ASC) AS perc
 FROM   taggings tg2, tags t
 WHERE  tg2.score IS NOT NULL
 ANDtg2.tag_id = t.id
 ANDt.tier = 2) AS x
WHERE   tg.id = x.id
AND tg.score IS NOT NULL
;
Process 24203: UPDATE taggings SET score = 2 WHERE
taggings.id = 29105523

Note that these two queries are actually updating different columns, albeit
apparently in the same row.

Is there anything I can do to avoid a deadlock here? The big query does
nothing else in its transaction; the little query's transaction might
update several rows from `taggings`, which I guess is the real reason for
the deadlock.

I'd be pretty satisfied with approximate values for the big query. As you
can see, it is just taking the `score` of each `tagging` and computing the
percentage of times it beats other taggings of the same tag. Is there
something I can do with transaction isolation levels here? I don't care if
the big query operates on slightly-out-of-date values. Since each query
updates different columns, I think there should be no issue with them
overwriting each other, right?

Thanks,
Paul



-- 
_
Pulchritudo splendor veritatis.