Hi,

I was working on optimizing the query - "query_bug_packages". I have a
small doubt regarding the query:

In the sub-query:

SELECT s.source, b.component, s.homepage, s.vcs_browser, s.maintainer,
s.version, row_number() OVER (PARTITION BY s.source ORDER BY s.version DESC)
            FROM blends_dependencies b
            JOIN packages p ON p.package = b.package
            JOIN bugs bu    ON bu.source = p.source
            JOIN sources s  ON s.source  = p.source
            WHERE blend = $1 AND b.distribution = 'debian'
            GROUP BY s.source, b.dependency, b.component, s.homepage,
s.vcs_browser, s.maintainer, s.version
        ) sources -- check status of dependency relation because only
suggested packages are less important for bugs sentinel

we are already using PARTITION BY and selecting "row_number=1", then why
exactly do we need GROUP BY ? I ran "EXPLAIN ANALYZE" on the query and it
shows that sorting according to GROUP_BY consumes alot of time as
postgresql incorrectly estimates the no. of rows it can get. Below is the
relevant snippet of the "EXPLAIN ANALYZE select ...":

Group Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser,
s.maintainer, s.version
    ->  Sort  (cost=1562098.14..1573387.76 rows=4515849 width=182) (actual
time=9957627.487..11431481.052 rows=32929272 loops=1)
            Sort Key: s.source, b.dependency, b.component, s.homepage,
s.vcs_browser, s.maintainer, s.version
            Sort Method: external merge  Disk: 5797768kB
            ->  Merge Join  (cost=179177.90..260321.57 rows=4515849
width=182) (actual time=8609.923..146112.242 rows=32929272 loops=1)
                Merge Cond: (bu.source = p.source)
                ->  Index Only Scan using bugs_source_idx on bugs bu
 (cost=0.42..13573.81 rows=88898 width=9) (actual time=0.030..194.105
rows=88653 loops=1)
                Heap Fetches: 88653
                ->  Materialize  (cost=179177.48..183443.32 rows=220577
width=192) (actual time=8609.819..47071.992 rows=32944075 loops=1)
                ->  Merge Join  (cost=179177.48..182891.88 rows=220577
width=192) (actual time=8609.813..12251.926 rows=283633 loops=1)
                    Merge Cond: (s.source = p.source)
                    ->  Sort  (cost=27674.48..27877.38 rows=81159
width=175) (actual time=580.788..3621.560 rows=80830 loops=1)
                        Sort Key: s.source
                        Sort Method: external merge  Disk: 12712kB
                        ->  Seq Scan on sources s  (cost=0.00..14119.59
rows=81159 width=175) (actual time=0.191..84.195 rows=81159 loops=1)
                ->  Sort  (cost=151503.00..151647.91 rows=57963 width=17)
(actual time=8028.629..8141.948 rows=283630 loops=1)


I tried indexing, but udd already has very well-defined indexes. I also
tried changing the join conditions but there was not much improvement. I
maybe wrong in my analysis but if you could help me understand the query a
little better, it would be of great help.

PFA the ouput of "EXPLAIN ANALYZE" for "query_bug_packages".

Thanking You,
Akshita Jha


On Wed, Mar 11, 2015 at 4:10 PM, Andreas Tille <[email protected]> wrote:

> On Wed, Mar 11, 2015 at 01:55:41PM +0530, Akshita Jha wrote:
> > Hi,
> >
> > Sorry for the delay.
>
> No need to sorry.
>
> > > ...
> > > If you enjoy some optimising of my initial query which was quite naive
> > > without any speed optimisation in mind this would be the final step to
> > > replace the original bugs.py (which needed actually replacing *because*
> > > of speed considerations since it was doing everything in very small
> > > queries to a remote host once you go into production were UDD runs
> > > remotely).
> > >
> >
> > Has there been any progress on this ? Should I start working on if its
> not
> > too late ?
>
> Its not to late.  At least I did not spent any time in optimising the
> query.  I pointed other GSoC candidates to the discussion but no commit
> to Git means for me no solution.
>
> > Thanks alot.
>
> Thanks to you
>
>         Andreas.
>
> --
> http://fam-tille.de
>
>
> --
> To UNSUBSCRIBE, email to [email protected]
> with a subject of "unsubscribe". Trouble? Contact
> [email protected]
> Archive: https://lists.debian.org/[email protected]
>
>


-- 
Akshita Jha
Explain Analyze output:                                                         
                                                                                
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2438339.43..2460667.27 rows=1275877 width=190) (actual 
time=11480841.565..11480841.890 rows=893 loops=1)
   ->  Sort  (cost=2438339.43..2441529.12 rows=1275877 width=190) (actual 
time=11480841.565..11480841.608 rows=893 loops=1)
         Sort Key: sources.source, tmp.task, (CASE WHEN (((tmp.dependency = 
'd'::bpchar) OR (tmp.dependency = 'r'::bpchar)) AND (sources.component = 
'main'::text) AND (exp.experimental_flag > 0)) THEN 'depends'::text ELSE 
'suggests'::text END), sources.homepage, (CASE WHEN (sources.vcs_browser IS 
NULL) THEN '#'::text ELSE sources.vcs_browser END), sources.maintainer
         Sort Method: quicksort  Memory: 236kB
         ->  Merge Left Join  (cost=2017319.57..2073451.99 rows=1275877 
width=190) (actual time=11480825.403..11480840.867 rows=893 loops=1)
               Merge Cond: (sources.source = exp.source)
               ->  Merge Left Join  (cost=1999538.63..2023746.53 rows=11037 
width=186) (actual time=11479432.519..11479437.375 rows=893 loops=1)
                     Merge Cond: (sources.source = tmp.source)
                     ->  Subquery Scan on sources  (cost=1681473.33..1686852.70 
rows=828 width=171) (actual time=11465988.190..11465990.119 rows=669 loops=1)
                           Filter: (sources.row_number = 1)
                           Rows Removed by Filter: 1775
                           ->  WindowAgg  (cost=1681473.33..1684783.71 
rows=165519 width=182) (actual time=11465963.684..11465965.370 rows=2444 
loops=1)
                                 ->  Sort  (cost=1681473.33..1681887.13 
rows=165519 width=182) (actual time=11465756.505..11465756.773 rows=2444 
loops=1)
                                       Sort Key: s.source, s.version
                                       Sort Method: quicksort  Memory: 663kB
                                       ->  Group  (cost=1562098.14..1652415.12 
rows=165519 width=182) (actual time=9957627.514..11465734.616 rows=2444 loops=1)
                                             Group Key: s.source, b.dependency, 
b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
                                             ->  Sort  
(cost=1562098.14..1573387.76 rows=4515849 width=182) (actual 
time=9957627.487..11431481.052 rows=32929272 loops=1)
                                                   Sort Key: s.source, 
b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
                                                   Sort Method: external merge  
Disk: 5797768kB
                                                   ->  Merge Join  
(cost=179177.90..260321.57 rows=4515849 width=182) (actual 
time=8609.923..146112.242 rows=32929272 loops=1)
                                                         Merge Cond: (bu.source 
= p.source)
                                                         ->  Index Only Scan 
using bugs_source_idx on bugs bu  (cost=0.42..13573.81 rows=88898 width=9) 
(actual time=0.030..194.105 rows=88653 loops=1)
                                                               Heap Fetches: 
88653
                                                         ->  Materialize  
(cost=179177.48..183443.32 rows=220577 width=192) (actual 
time=8609.819..47071.992 rows=32944075 loops=1)
                                                               ->  Merge Join  
(cost=179177.48..182891.88 rows=220577 width=192) (actual 
time=8609.813..12251.926 rows=283633 loops=1)
                                                                     Merge 
Cond: (s.source = p.source)
                                                                     ->  Sort  
(cost=27674.48..27877.38 rows=81159 width=175) (actual time=580.788..3621.560 
rows=80830 loops=1)
                                                                           Sort 
Key: s.source
                                                                           Sort 
Method: external merge  Disk: 12712kB
                                                                           ->  
Seq Scan on sources s  (cost=0.00..14119.59 rows=81159 width=175) (actual 
time=0.191..84.195 rows=81159 loops=1)
                                                                     ->  Sort  
(cost=151503.00..151647.91 rows=57963 width=17) (actual time=8028.629..8141.948 
rows=283630 loops=1)
                                                                           Sort 
Key: p.source
                                                                           Sort 
Method: quicksort  Memory: 3987kB
                                                                           ->  
Hash Join  (cost=178.42..146917.30 rows=57963 width=17) (actual 
time=16.339..7816.764 rows=47766 loops=1)
                                                                                
 Hash Cond: (p.package = b.package)
                                                                                
 ->  Seq Scan on packages p  (cost=0.00..121369.75 rows=1239475 width=24) 
(actual time=14.406..7263.532 rows=1239566 loops=1)
                                                                                
 ->  Hash  (cost=159.64..159.64 rows=1503 width=18) (actual time=1.896..1.896 
rows=1498 loops=1)
                                                                                
       Buckets: 1024  Batches: 1  Memory Usage: 78kB
                                                                                
       ->  Bitmap Heap Scan on blends_dependencies b  (cost=60.65..159.64 
rows=1503 width=18) (actual time=0.401..1.282 rows=1498 loops=1)
                                                                                
             Recheck Cond: (blend = 'debian-edu'::text)
                                                                                
             Filter: (distribution = 'debian'::text)
                                                                                
             Rows Removed by Filter: 101
                                                                                
             Heap Blocks: exact=41
                                                                                
             ->  Bitmap Index Scan on blends_dependencies_pkey  
(cost=0.00..60.27 rows=1599 width=0) (actual time=0.369..0.369 rows=1599 
loops=1)
                                                                                
                   Index Cond: (blend = 'debian-edu'::text)
                     ->  Materialize  (cost=318065.29..336732.87 rows=2666 
width=25) (actual time=13434.208..13435.482 rows=1093 loops=1)
                           ->  Subquery Scan on tmp  (cost=318065.29..336726.21 
rows=2666 width=25) (actual time=13434.186..13435.212 rows=1093 loops=1)
                                 Filter: (tmp.row_number = 1)
                                 Rows Removed by Filter: 45
                                 ->  WindowAgg  (cost=318065.29..330061.60 
rows=533169 width=29) (actual time=13434.174..13435.031 rows=1138 loops=1)
                                       ->  Sort  (cost=318065.29..319398.22 
rows=533169 width=29) (actual time=13434.165..13434.278 rows=1138 loops=1)
                                             Sort Key: p_1.source, b_1.task, 
bdp.priority
                                             Sort Method: quicksort  Memory: 
138kB
                                             ->  Group  
(cost=247927.52..254592.14 rows=533169 width=29) (actual 
time=13385.821..13429.323 rows=1138 loops=1)
                                                   Group Key: p_1.source, 
b_1.task, bdp.dependency, bdp.priority
                                                   ->  Sort  
(cost=247927.52..249260.45 rows=533169 width=29) (actual 
time=13385.817..13417.367 rows=58004 loops=1)
                                                         Sort Key: p_1.source, 
b_1.task, bdp.dependency, bdp.priority
                                                         Sort Method: external 
merge  Disk: 2184kB
                                                         ->  Merge Join  
(cost=176446.38..184454.37 rows=533169 width=29) (actual 
time=12486.560..12524.167 rows=58004 loops=1)
                                                               Merge Cond: 
(bdp.dependency = b_1.dependency)
                                                               ->  Sort  
(cost=146.16..151.38 rows=2090 width=9) (actual time=0.088..0.091 rows=5 
loops=1)
                                                                     Sort Key: 
bdp.dependency
                                                                     Sort 
Method: quicksort  Memory: 25kB
                                                                     ->  Seq 
Scan on blends_dependencies_priorities bdp  (cost=0.00..30.90 rows=2090 
width=9) (actual time=0.041..0.045 rows=5 loops=1)
                                                               ->  Sort  
(cost=176300.23..176427.78 rows=51021 width=22) (actual 
time=12477.896..12488.461 rows=58004 loops=1)
                                                                     Sort Key: 
b_1.dependency
                                                                     Sort 
Method: external sort  Disk: 2112kB
                                                                     ->  Hash 
Join  (cost=15988.62..172310.69 rows=51021 width=22) (actual 
time=2149.175..12190.009 rows=58004 loops=1)
                                                                           Hash 
Cond: ((p_1.source = s_1.source) AND (p_1.release = s_1.release))
                                                                           ->  
Hash Join  (cost=175.65..148500.90 rows=61666 width=29) (actual 
time=380.202..10269.907 rows=47770 loops=1)
                                                                                
 Hash Cond: (p_1.package = b_1.package)
                                                                                
 ->  Seq Scan on packages p_1  (cost=0.00..121369.75 rows=1239475 width=31) 
(actual time=349.165..9463.330 rows=1239566 loops=1)
                                                                                
 ->  Hash  (cost=155.66..155.66 rows=1599 width=23) (actual time=2.994..2.994 
rows=1599 loops=1)
                                                                                
       Buckets: 1024  Batches: 1  Memory Usage: 93kB
                                                                                
       ->  Bitmap Heap Scan on blends_dependencies b_1  (cost=60.67..155.66 
rows=1599 width=23) (actual time=0.988..1.885 rows=1599 loops=1)
                                                                                
             Recheck Cond: (blend = 'debian-edu'::text)
                                                                                
             Heap Blocks: exact=41
                                                                                
             ->  Bitmap Index Scan on blends_dependencies_pkey  
(cost=0.00..60.27 rows=1599 width=0) (actual time=0.948..0.948 rows=1599 
loops=1)
                                                                                
                   Index Cond: (blend = 'debian-edu'::text)
                                                                           ->  
Hash  (cost=14119.59..14119.59 rows=81159 width=19) (actual 
time=1739.343..1739.343 rows=81159 loops=1)
                                                                                
 Buckets: 8192  Batches: 2  Memory Usage: 2064kB
                                                                                
 ->  Seq Scan on sources s_1  (cost=0.00..14119.59 rows=81159 width=19) (actual 
time=6.442..1632.060 rows=81159 loops=1)
               ->  Sort  (cost=17780.94..17838.74 rows=23120 width=17) (actual 
time=1392.848..1395.171 rows=24971 loops=1)
                     Sort Key: exp.source
                     Sort Method: quicksort  Memory: 2289kB
                     ->  Subquery Scan on exp  (cost=15642.70..16105.10 
rows=23120 width=17) (actual time=1268.256..1281.538 rows=24839 loops=1)
                           ->  HashAggregate  (cost=15642.70..15873.90 
rows=23120 width=17) (actual time=1268.254..1277.493 rows=24839 loops=1)
                                 Group Key: s_2.source
                                 ->  Hash Join  (cost=1.38..15236.91 rows=81159 
width=17) (actual time=12.624..1144.854 rows=79312 loops=1)
                                       Hash Cond: (s_2.release = r.release)
                                       ->  Seq Scan on sources s_2  
(cost=0.00..14119.59 rows=81159 width=19) (actual time=12.576..1034.229 
rows=81159 loops=1)
                                       ->  Hash  (cost=1.17..1.17 rows=17 
width=36) (actual time=0.017..0.017 rows=17 loops=1)
                                             Buckets: 1024  Batches: 1  Memory 
Usage: 1kB
                                             ->  Seq Scan on releases r  
(cost=0.00..1.17 rows=17 width=36) (actual time=0.008..0.011 rows=17 loops=1)
 Planning time: 14.264 ms
 Execution time: 11481139.274 ms
(97 rows)

Reply via email to