[
https://issues.apache.org/jira/browse/CONNECTORS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129922#comment-14129922
]
Paul Boichat commented on CONNECTORS-1027:
------------------------------------------
Quert Plans: - Note that I only see this performance difference when a job is
running (possibly as a result of documents being in a P or G status
Original Query:
explain analyze SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,
t0.priorityset FROM jobqueue t0 WHERE t0.status IN ('P','G') AND
t0.checkaction='R' AND t0.checktime <= 1407246846166 AND EXISTS (SELECT 'x'
FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND
t1.priority=5) AND NOT EXISTS ( SELECT 'x' FROM jobqueue t2 WHERE
t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
t2.jobid!=t0.jobid ) AND NOT EXISTS ( SELECT 'x' FROM prereqevents t3,events
t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name ) ORDER BY t0.docpriority
ASC LIMIT 480;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------
Limit (cost=499234.08..499235.28 rows=480 width=179) (actual
time=23830.242..23830.496 rows=480 loops=1)
-> Sort (cost=499234.08..499858.32 rows=249697 width=179) (actual
time=23830.238..23830.332 rows=480 loops=1)
Sort Key: t0.docpriority
Sort Method: top-N heapsort Memory: 152kB
-> Nested Loop Anti Join (cost=38636.70..486865.47 rows=249697
width=179) (actual time=1694.431..23772.466 r
ows=64682 loops=1)
Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text =
(t0.dochash)::text))
Rows Removed by Join Filter: 40361568
-> Hash Anti Join (cost=38636.70..482464.67 rows=249697
width=179) (actual time=1692.868..4870.315 row
s=64682 loops=1)
Hash Cond: (t0.id = t3.owner)
-> Hash Join (cost=38608.92..479284.46 rows=249700
width=179) (actual time=1692.766..4832.385 ro
ws=64682 loops=1)
Hash Cond: (t0.jobid = t1.id)
-> Bitmap Heap Scan on jobqueue t0
(cost=38452.39..473170.01 rows=1373350 width=179) (actu
al time=1340.245..3427.182 rows=4817107 loops=1)
Recheck Cond: ((status = ANY
('{P,G}'::bpchar[])) AND (checkaction = 'R'::bpchar) AND
(checktime <= 1407246846166::bigint))
-> Bitmap Index Scan on i1392985450174
(cost=0.00..38109.05 rows=1373350 width=0) (a
ctual time=1281.749..1281.749 rows=4826138 loops=1)
Index Cond: ((status = ANY
('{P,G}'::bpchar[])) AND (checkaction = 'R'::bpchar)
AND (checktime <= 1407246846166::bigint))
-> Hash (cost=156.51..156.51 rows=2 width=8)
(actual time=2.025..2.025 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on jobs t1 (cost=0.00..156.51
rows=2 width=8) (actual time=0.041..2.021
rows=1 loops=1)
Filter: ((status = ANY
('{A,a}'::bpchar[])) AND (priority = 5))
Rows Removed by Filter: 33
-> Hash (cost=26.15..26.15 rows=130 width=8) (actual
time=0.005..0.005 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Hash Join (cost=12.93..26.15 rows=130 width=8)
(actual time=0.004..0.004 rows=0 loops=1
)
Hash Cond: ((t3.eventname)::text =
(t4.name)::text)
-> Seq Scan on prereqevents t3
(cost=0.00..11.40 rows=140 width=524) (actual time=0.
001..0.001 rows=0 loops=1)
-> Hash (cost=11.30..11.30 rows=130
width=516) (never executed)
-> Seq Scan on events t4
(cost=0.00..11.30 rows=130 width=516) (never executed
)
-> Materialize (cost=0.00..31.11 rows=1 width=49) (actual
time=0.000..0.126 rows=624 loops=64682)
-> Index Scan using i1392985450173 on jobqueue t2
(cost=0.00..31.10 rows=1 width=49) (actual tim
e=0.069..0.739 rows=624 loops=1)
Index Cond: (status = ANY
('{A,F,a,f,D,d}'::bpchar[]))
Total runtime: 23832.087 ms
(31 rows)
Revised Query:
explain analyze SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,
t0.priorityset FROM jobqueue t0
WHERE (t0.status = 'P' OR t0.status =
'G') AND t0.checkaction='R' AND
t0.checktime <= 1407246846166
AND EXISTS (
SELECT 'x' FROM jobs t1
WHERE (t1.status = 'A' OR t1.status = 'a') AND t1.id=t0.jobid AND
t1.priority=5
)
AND NOT EXISTS (
SELECT 'x' FROM jobqueue t2
WHERE t2.dochash=t0.dochash AND (t2.status = 'A' OR t2.status = 'F'
OR t2.status = 'a' OR t2.status = 'f' OR t2.status = 'D' OR t2.status
= 'd')
AND t2.jobid!=t0.jobid
)
AND NOT EXISTS (
SELECT 'x' FROM prereqevents t3,events t4
WHERE t0.id=t3.owner AND t3.eventname=t4.name
)
ORDER BY t0.docpriority ASC
LIMIT 480;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------
Limit (cost=35.17..2868.75 rows=480 width=179) (actual time=2.004..136.828
rows=480 loops=1)
-> Nested Loop Anti Join (cost=35.17..1395679.74 rows=236418 width=179)
(actual time=2.002..136.601 rows=480 loops
=1)
Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text =
(t0.dochash)::text))
Rows Removed by Join Filter: 304320
-> Nested Loop Anti Join (cost=0.00..1391503.23 rows=236418
width=179) (actual time=0.819..3.286 rows=480 lo
ops=1)
-> Nested Loop Semi Join (cost=0.00..1185367.86 rows=236421
width=179) (actual time=0.808..2.009 rows=
480 loops=1)
Join Filter: (t0.jobid = t1.id)
-> Index Scan using i1392985450172 on jobqueue t0
(cost=0.00..1146201.81 rows=1300315 width=179)
(actual time=0.746..1.385 rows=480 loops=1)
Index Cond: ((checkaction = 'R'::bpchar) AND
(checktime <= 1407246846166::bigint))
Filter: ((status = 'P'::bpchar) OR (status =
'G'::bpchar))
Rows Removed by Filter: 633
-> Materialize (cost=0.00..156.60 rows=2 width=8)
(actual time=0.000..0.000 rows=1 loops=480)
-> Seq Scan on jobs t1 (cost=0.00..156.59 rows=2
width=8) (actual time=0.045..0.045 rows=1
loops=1)
Filter: ((priority = 5) AND ((status =
'A'::bpchar) OR (status = 'a'::bpchar)))
Rows Removed by Filter: 18
-> Nested Loop (cost=0.00..0.86 rows=1 width=8) (actual
time=0.002..0.002 rows=0 loops=480)
-> Index Scan using i1392985450178 on prereqevents t3
(cost=0.00..0.27 rows=1 width=524) (actual
time=0.001..0.001 rows=0 loops=480)
Index Cond: (t0.id = owner)
-> Index Only Scan using events_pkey on events t4
(cost=0.00..0.58 rows=1 width=516) (never exec
uted)
Index Cond: (name = (t3.eventname)::text)
Heap Fetches: 0
-> Materialize (cost=35.17..39.20 rows=1 width=49) (actual
time=0.001..0.124 rows=634 loops=480)
-> Bitmap Heap Scan on jobqueue t2 (cost=35.17..39.19 rows=1
width=49) (actual time=0.261..0.699 rows=
634 loops=1)
Recheck Cond: ((status = 'A'::bpchar) OR (status =
'F'::bpchar) OR (status = 'a'::bpchar) OR (stat
us = 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar))
-> BitmapOr (cost=35.17..35.17 rows=1 width=0) (actual
time=0.249..0.249 rows=0 loops=1)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.215
..0.215 rows=1000 loops=1)
Index Cond: (status = 'A'::bpchar)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.012
..0.012 rows=2 loops=1)
Index Cond: (status = 'F'::bpchar)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.005
..0.005 rows=0 loops=1)
Index Cond: (status = 'a'::bpchar)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.004
..0.004 rows=0 loops=1)
Index Cond: (status = 'f'::bpchar)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.004
..0.004 rows=0 loops=1)
Index Cond: (status = 'D'::bpchar)
-> Bitmap Index Scan on i1392985450173
(cost=0.00..5.86 rows=1 width=0) (actual time=0.004
..0.004 rows=0 loops=1)
Index Cond: (status = 'd'::bpchar)
Total runtime: 137.195 ms
(38 rows)
> Database performance optimization
> ---------------------------------
>
> Key: CONNECTORS-1027
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1027
> Project: ManifoldCF
> Issue Type: Task
> Components: Framework core
> Affects Versions: ManifoldCF 1.7
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.0
>
>
> Some database queries seem to perform better against postgresql in different
> form. For example:
> {code}
> SELECT
> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,
> t0.priorityset FROM jobqueue t0
> WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime
> <= 1407246846166
> AND EXISTS (
> SELECT 'x' FROM jobs t1
> WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5
> )
> AND NOT EXISTS (
> SELECT 'x' FROM jobqueue t2
> WHERE t2.dochash=t0.dochash AND t2.status IN
> ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid
> )
> AND NOT EXISTS (
> SELECT 'x' FROM prereqevents t3,events t4
> WHERE t0.id=t3.owner AND t3.eventname=t4.name
> )
> ORDER BY t0.docpriority ASC
> LIMIT 480;
> {code}
> ...apparently performs better on some versions of postgresql when written
> like this:
> {code}
> SELECT
> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,
> t0.priorityset FROM jobqueue t0
> WHERE (t0.status = 'P' OR t0.status = 'G') AND t0.checkaction='R' AND
> t0.checktime <= 1407246846166
> AND EXISTS (
> SELECT 'x' FROM jobs t1
> WHERE (t1.status = 'A' OR t1.status = 'a') AND t1.id=t0.jobid AND
> t1.priority=5
> )
> AND NOT EXISTS (
> SELECT 'x' FROM jobqueue t2
> WHERE t2.dochash=t0.dochash AND (t2.status = 'A' OR t2.status = 'F'
> OR t2.status = 'a' OR t2.status = 'f' OR t2.status = 'D' OR t2.status
> = 'd')
> AND t2.jobid!=t0.jobid
> )
> AND NOT EXISTS (
> SELECT 'x' FROM prereqevents t3,events t4
> WHERE t0.id=t3.owner AND t3.eventname=t4.name
> )
> ORDER BY t0.docpriority ASC
> LIMIT 480;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)