[
https://issues.apache.org/jira/browse/CONNECTORS-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642646#comment-13642646
]
Karl Wright commented on CONNECTORS-678:
----------------------------------------
FWIW, it is now becoming clear that the reason this query plan is failing now
is because the jobqueue table's stats are out of date. Unfortunately, when the
stats are computed, there are typically very few items waiting to be processed
on the queue. And the postgres planner is VERY sensitive to the statistical
values zero and one; it takes them as gospel and throws away perfectly
reasonable plans because it thinks it can do a little bit better. So when the
stats say that there are zero or one records matching the query, and in fact
there are a couple of thousand, the plan is going to be wrong - even though
there may be millions of rows in this table, and the existing statistics are in
fact accurate to four decimal places.
The only way I can think of to get around the problem, if we confirm that this
is indeed what is taking place, is to do one of the following:
- Hope there is some way we can force PostgreSQL to plan things a bit
differently. We'll be going through the process later today (if Erlend gets
those statistics) of seeing what can be done through this route.
- Analyze the table before every stuffing query. This may be an option on
postgresql; the code used to do this. But I removed it because it made MySQL
useless for large tables. I could add it back for the PostgreSQL database
alone, although it adds a considerable cost.
- Add "doppleganger" records to the jobqueue table. These would be a couple of
dozen records not intended to be processed, and marked as "dummy" records,
there just to prevent Postgresql's planner from making stupid decisions. If we
wind up having to go this route, you can bet I will also open a Postgresql
planner ticket to deal with the whole issue of not assuming there are error
bars in the statistics estimates. We cannot be the only people affected by
this aspect of Postgresql's decision-making process. Such an approach would
require a schema change and would certainly delay our release.
> Postgresql generating unusual and slow (76 second) plans for stuffer queries
> ----------------------------------------------------------------------------
>
> Key: CONNECTORS-678
> URL: https://issues.apache.org/jira/browse/CONNECTORS-678
> Project: ManifoldCF
> Issue Type: Bug
> Components: Framework core
> Affects Versions: ManifoldCF 1.1.1
> Reporter: Karl Wright
> Assignee: Karl Wright
> Priority: Critical
> Fix For: ManifoldCF 1.2
>
>
> Query plans like this seem to be taking place:
> {code}
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: Limit
> (cost=9597.49..9597.49 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: -> Sort
> (cost=9597.49..9597.49 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: Sort Key:
> t0.docpriority
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: -> Nested
> Loop Anti Join (cost=17.68..9597.48 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: Join
> Filter: (t2.jobid <> t0.jobid)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: ->
> Nested Loop Semi Join (cost=17.68..9588.87 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Join Filter: (t0.jobid = t1.id)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Nested Loop Anti Join (cost=17.68..9586.81 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Join Filter: (t0.id = t3.owner)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Bitmap Heap Scan on jobqueue t0 (cost=4.53..6.54 rows=1 width=155)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Recheck Cond: (status = ANY ('{P,G}'::bpchar[]))
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Filter: ((checktime <= 1366628406182::bigint) AND (checkaction =
> 'R'::bpchar))
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Bitmap Index Scan on i1362584563122 (cost=0.00..4.53 rows=1
> width=0)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Index Cond: (status = ANY ('{P,G}'::bpchar[]))
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Hash Join (cost=13.15..7635.56 rows=231554 width=8)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Hash Cond: ((t3.eventname)::text = (t4.name)::text)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Seq Scan on prereqevents t3 (cost=0.00..4438.54 rows=231554
> width=37)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Hash (cost=11.40..11.40 rows=140 width=516)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Seq Scan on events t4 (cost=0.00..11.40 rows=140
> width=516)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> -> Seq Scan on jobs t1 (cost=0.00..2.03 rows=2 width=8)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Filter: ((status = ANY ('{A,a}'::bpchar[])) AND (priority = 5::bigint))
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan: ->
> Index Scan using i1362584563120 on jobqueue t2 (cost=0.00..4.30 rows=1
> width=49)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Index Cond: ((dochash)::text = (t0.dochash)::text)
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) - Plan:
> Filter: (status = ANY ('{A,F,a,f,D,d}'::bpchar[]))
> WARN 2013-04-22 13:01:20,414 (Stuffer thread) -
> {code}
> Here's the query:
> {code}
> EXPLAIN SELECT
> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
> FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
> t0.checktime<=1366628406182 AND
> EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND
> t1.id=t0.jobid AND t1.priority=5) AND
> NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND
> t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND
> NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner
> AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 100;
> {code}
> This query was noted in the log as taking 76 seconds to execute.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira