Hi,
I was working on optimizing the query - "query_bug_packages". I have a
small doubt regarding the query:
In the sub-query:
SELECT s.source, b.component, s.homepage, s.vcs_browser, s.maintainer,
s.version, row_number() OVER (PARTITION BY s.source ORDER BY s.version DESC)
FROM blends_dependencies b
JOIN packages p ON p.package = b.package
JOIN bugs bu ON bu.source = p.source
JOIN sources s ON s.source = p.source
WHERE blend = $1 AND b.distribution = 'debian'
GROUP BY s.source, b.dependency, b.component, s.homepage,
s.vcs_browser, s.maintainer, s.version
) sources -- check status of dependency relation because only
suggested packages are less important for bugs sentinel
we are already using PARTITION BY and selecting "row_number=1", then why
exactly do we need GROUP BY ? I ran "EXPLAIN ANALYZE" on the query and it
shows that sorting according to GROUP_BY consumes alot of time as
postgresql incorrectly estimates the no. of rows it can get. Below is the
relevant snippet of the "EXPLAIN ANALYZE select ...":
Group Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser,
s.maintainer, s.version
-> Sort (cost=1562098.14..1573387.76 rows=4515849 width=182) (actual
time=9957627.487..11431481.052 rows=32929272 loops=1)
Sort Key: s.source, b.dependency, b.component, s.homepage,
s.vcs_browser, s.maintainer, s.version
Sort Method: external merge Disk: 5797768kB
-> Merge Join (cost=179177.90..260321.57 rows=4515849
width=182) (actual time=8609.923..146112.242 rows=32929272 loops=1)
Merge Cond: (bu.source = p.source)
-> Index Only Scan using bugs_source_idx on bugs bu
(cost=0.42..13573.81 rows=88898 width=9) (actual time=0.030..194.105
rows=88653 loops=1)
Heap Fetches: 88653
-> Materialize (cost=179177.48..183443.32 rows=220577
width=192) (actual time=8609.819..47071.992 rows=32944075 loops=1)
-> Merge Join (cost=179177.48..182891.88 rows=220577
width=192) (actual time=8609.813..12251.926 rows=283633 loops=1)
Merge Cond: (s.source = p.source)
-> Sort (cost=27674.48..27877.38 rows=81159
width=175) (actual time=580.788..3621.560 rows=80830 loops=1)
Sort Key: s.source
Sort Method: external merge Disk: 12712kB
-> Seq Scan on sources s (cost=0.00..14119.59
rows=81159 width=175) (actual time=0.191..84.195 rows=81159 loops=1)
-> Sort (cost=151503.00..151647.91 rows=57963 width=17)
(actual time=8028.629..8141.948 rows=283630 loops=1)
I tried indexing, but udd already has very well-defined indexes. I also
tried changing the join conditions but there was not much improvement. I
maybe wrong in my analysis but if you could help me understand the query a
little better, it would be of great help.
PFA the ouput of "EXPLAIN ANALYZE" for "query_bug_packages".
Thanking You,
Akshita Jha
On Wed, Mar 11, 2015 at 4:10 PM, Andreas Tille <[email protected]> wrote:
> On Wed, Mar 11, 2015 at 01:55:41PM +0530, Akshita Jha wrote:
> > Hi,
> >
> > Sorry for the delay.
>
> No need to sorry.
>
> > > ...
> > > If you enjoy some optimising of my initial query which was quite naive
> > > without any speed optimisation in mind this would be the final step to
> > > replace the original bugs.py (which needed actually replacing *because*
> > > of speed considerations since it was doing everything in very small
> > > queries to a remote host once you go into production were UDD runs
> > > remotely).
> > >
> >
> > Has there been any progress on this ? Should I start working on if its
> not
> > too late ?
>
> Its not to late. At least I did not spent any time in optimising the
> query. I pointed other GSoC candidates to the discussion but no commit
> to Git means for me no solution.
>
> > Thanks alot.
>
> Thanks to you
>
> Andreas.
>
> --
> http://fam-tille.de
>
>
> --
> To UNSUBSCRIBE, email to [email protected]
> with a subject of "unsubscribe". Trouble? Contact
> [email protected]
> Archive: https://lists.debian.org/[email protected]
>
>
--
Akshita Jha
Explain Analyze output:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2438339.43..2460667.27 rows=1275877 width=190) (actual
time=11480841.565..11480841.890 rows=893 loops=1)
-> Sort (cost=2438339.43..2441529.12 rows=1275877 width=190) (actual
time=11480841.565..11480841.608 rows=893 loops=1)
Sort Key: sources.source, tmp.task, (CASE WHEN (((tmp.dependency =
'd'::bpchar) OR (tmp.dependency = 'r'::bpchar)) AND (sources.component =
'main'::text) AND (exp.experimental_flag > 0)) THEN 'depends'::text ELSE
'suggests'::text END), sources.homepage, (CASE WHEN (sources.vcs_browser IS
NULL) THEN '#'::text ELSE sources.vcs_browser END), sources.maintainer
Sort Method: quicksort Memory: 236kB
-> Merge Left Join (cost=2017319.57..2073451.99 rows=1275877
width=190) (actual time=11480825.403..11480840.867 rows=893 loops=1)
Merge Cond: (sources.source = exp.source)
-> Merge Left Join (cost=1999538.63..2023746.53 rows=11037
width=186) (actual time=11479432.519..11479437.375 rows=893 loops=1)
Merge Cond: (sources.source = tmp.source)
-> Subquery Scan on sources (cost=1681473.33..1686852.70
rows=828 width=171) (actual time=11465988.190..11465990.119 rows=669 loops=1)
Filter: (sources.row_number = 1)
Rows Removed by Filter: 1775
-> WindowAgg (cost=1681473.33..1684783.71
rows=165519 width=182) (actual time=11465963.684..11465965.370 rows=2444
loops=1)
-> Sort (cost=1681473.33..1681887.13
rows=165519 width=182) (actual time=11465756.505..11465756.773 rows=2444
loops=1)
Sort Key: s.source, s.version
Sort Method: quicksort Memory: 663kB
-> Group (cost=1562098.14..1652415.12
rows=165519 width=182) (actual time=9957627.514..11465734.616 rows=2444 loops=1)
Group Key: s.source, b.dependency,
b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
-> Sort
(cost=1562098.14..1573387.76 rows=4515849 width=182) (actual
time=9957627.487..11431481.052 rows=32929272 loops=1)
Sort Key: s.source,
b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
Sort Method: external merge
Disk: 5797768kB
-> Merge Join
(cost=179177.90..260321.57 rows=4515849 width=182) (actual
time=8609.923..146112.242 rows=32929272 loops=1)
Merge Cond: (bu.source
= p.source)
-> Index Only Scan
using bugs_source_idx on bugs bu (cost=0.42..13573.81 rows=88898 width=9)
(actual time=0.030..194.105 rows=88653 loops=1)
Heap Fetches:
88653
-> Materialize
(cost=179177.48..183443.32 rows=220577 width=192) (actual
time=8609.819..47071.992 rows=32944075 loops=1)
-> Merge Join
(cost=179177.48..182891.88 rows=220577 width=192) (actual
time=8609.813..12251.926 rows=283633 loops=1)
Merge
Cond: (s.source = p.source)
-> Sort
(cost=27674.48..27877.38 rows=81159 width=175) (actual time=580.788..3621.560
rows=80830 loops=1)
Sort
Key: s.source
Sort
Method: external merge Disk: 12712kB
->
Seq Scan on sources s (cost=0.00..14119.59 rows=81159 width=175) (actual
time=0.191..84.195 rows=81159 loops=1)
-> Sort
(cost=151503.00..151647.91 rows=57963 width=17) (actual time=8028.629..8141.948
rows=283630 loops=1)
Sort
Key: p.source
Sort
Method: quicksort Memory: 3987kB
->
Hash Join (cost=178.42..146917.30 rows=57963 width=17) (actual
time=16.339..7816.764 rows=47766 loops=1)
Hash Cond: (p.package = b.package)
-> Seq Scan on packages p (cost=0.00..121369.75 rows=1239475 width=24)
(actual time=14.406..7263.532 rows=1239566 loops=1)
-> Hash (cost=159.64..159.64 rows=1503 width=18) (actual time=1.896..1.896
rows=1498 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 78kB
-> Bitmap Heap Scan on blends_dependencies b (cost=60.65..159.64
rows=1503 width=18) (actual time=0.401..1.282 rows=1498 loops=1)
Recheck Cond: (blend = 'debian-edu'::text)
Filter: (distribution = 'debian'::text)
Rows Removed by Filter: 101
Heap Blocks: exact=41
-> Bitmap Index Scan on blends_dependencies_pkey
(cost=0.00..60.27 rows=1599 width=0) (actual time=0.369..0.369 rows=1599
loops=1)
Index Cond: (blend = 'debian-edu'::text)
-> Materialize (cost=318065.29..336732.87 rows=2666
width=25) (actual time=13434.208..13435.482 rows=1093 loops=1)
-> Subquery Scan on tmp (cost=318065.29..336726.21
rows=2666 width=25) (actual time=13434.186..13435.212 rows=1093 loops=1)
Filter: (tmp.row_number = 1)
Rows Removed by Filter: 45
-> WindowAgg (cost=318065.29..330061.60
rows=533169 width=29) (actual time=13434.174..13435.031 rows=1138 loops=1)
-> Sort (cost=318065.29..319398.22
rows=533169 width=29) (actual time=13434.165..13434.278 rows=1138 loops=1)
Sort Key: p_1.source, b_1.task,
bdp.priority
Sort Method: quicksort Memory:
138kB
-> Group
(cost=247927.52..254592.14 rows=533169 width=29) (actual
time=13385.821..13429.323 rows=1138 loops=1)
Group Key: p_1.source,
b_1.task, bdp.dependency, bdp.priority
-> Sort
(cost=247927.52..249260.45 rows=533169 width=29) (actual
time=13385.817..13417.367 rows=58004 loops=1)
Sort Key: p_1.source,
b_1.task, bdp.dependency, bdp.priority
Sort Method: external
merge Disk: 2184kB
-> Merge Join
(cost=176446.38..184454.37 rows=533169 width=29) (actual
time=12486.560..12524.167 rows=58004 loops=1)
Merge Cond:
(bdp.dependency = b_1.dependency)
-> Sort
(cost=146.16..151.38 rows=2090 width=9) (actual time=0.088..0.091 rows=5
loops=1)
Sort Key:
bdp.dependency
Sort
Method: quicksort Memory: 25kB
-> Seq
Scan on blends_dependencies_priorities bdp (cost=0.00..30.90 rows=2090
width=9) (actual time=0.041..0.045 rows=5 loops=1)
-> Sort
(cost=176300.23..176427.78 rows=51021 width=22) (actual
time=12477.896..12488.461 rows=58004 loops=1)
Sort Key:
b_1.dependency
Sort
Method: external sort Disk: 2112kB
-> Hash
Join (cost=15988.62..172310.69 rows=51021 width=22) (actual
time=2149.175..12190.009 rows=58004 loops=1)
Hash
Cond: ((p_1.source = s_1.source) AND (p_1.release = s_1.release))
->
Hash Join (cost=175.65..148500.90 rows=61666 width=29) (actual
time=380.202..10269.907 rows=47770 loops=1)
Hash Cond: (p_1.package = b_1.package)
-> Seq Scan on packages p_1 (cost=0.00..121369.75 rows=1239475 width=31)
(actual time=349.165..9463.330 rows=1239566 loops=1)
-> Hash (cost=155.66..155.66 rows=1599 width=23) (actual time=2.994..2.994
rows=1599 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 93kB
-> Bitmap Heap Scan on blends_dependencies b_1 (cost=60.67..155.66
rows=1599 width=23) (actual time=0.988..1.885 rows=1599 loops=1)
Recheck Cond: (blend = 'debian-edu'::text)
Heap Blocks: exact=41
-> Bitmap Index Scan on blends_dependencies_pkey
(cost=0.00..60.27 rows=1599 width=0) (actual time=0.948..0.948 rows=1599
loops=1)
Index Cond: (blend = 'debian-edu'::text)
->
Hash (cost=14119.59..14119.59 rows=81159 width=19) (actual
time=1739.343..1739.343 rows=81159 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 2064kB
-> Seq Scan on sources s_1 (cost=0.00..14119.59 rows=81159 width=19) (actual
time=6.442..1632.060 rows=81159 loops=1)
-> Sort (cost=17780.94..17838.74 rows=23120 width=17) (actual
time=1392.848..1395.171 rows=24971 loops=1)
Sort Key: exp.source
Sort Method: quicksort Memory: 2289kB
-> Subquery Scan on exp (cost=15642.70..16105.10
rows=23120 width=17) (actual time=1268.256..1281.538 rows=24839 loops=1)
-> HashAggregate (cost=15642.70..15873.90
rows=23120 width=17) (actual time=1268.254..1277.493 rows=24839 loops=1)
Group Key: s_2.source
-> Hash Join (cost=1.38..15236.91 rows=81159
width=17) (actual time=12.624..1144.854 rows=79312 loops=1)
Hash Cond: (s_2.release = r.release)
-> Seq Scan on sources s_2
(cost=0.00..14119.59 rows=81159 width=19) (actual time=12.576..1034.229
rows=81159 loops=1)
-> Hash (cost=1.17..1.17 rows=17
width=36) (actual time=0.017..0.017 rows=17 loops=1)
Buckets: 1024 Batches: 1 Memory
Usage: 1kB
-> Seq Scan on releases r
(cost=0.00..1.17 rows=17 width=36) (actual time=0.008..0.011 rows=17 loops=1)
Planning time: 14.264 ms
Execution time: 11481139.274 ms
(97 rows)