Re: [Django] #25230: Change to Query.get_count() causes big performance hit

Django Thu, 06 Aug 2015 14:57:06 -0700

#25230: Change to Query.get_count() causes big performance hit
-------------------------------------+-------------------------------------
     Reporter:  dexity               |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  1.8
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------


Comment (by charettes):

 I do get a non negligible slowdown on PostgreSQL with a fully analyzed
 table containing ~30M rows.

 {{{
 database=# EXPLAIN ANALYZE SELECT COUNT('*') FROM (SELECT DISTINCT * FROM
 schema.table) subquery;
 QUERY PLAN
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=6213779.77..6213779.78 rows=1 width=0) (actual
 time=88610.804..88610.804 rows=1 loops=1)
    ->  Unique  (cost=5290067.41..5858505.79 rows=28421919 width=117)
 (actual time=53265.590..86513.151 rows=28176350 loops=1)
          ->  Sort  (cost=5290067.41..5361122.20 rows=28421919 width=117)
 (actual time=53265.588..76718.379 rows=28176350 loops=1)
                Sort Key: table.col0, table.col1, table.col2, table.col2,
 table.col3, table.col4, table.col5
                Sort Method: external merge  Disk: 1296744kB
                ->  Seq Scan on table  (cost=0.00..522350.19 rows=28421919
 width=117) (actual time=0.840..15830.723 rows=28176350 loops=1)
  Total runtime: 88695.934 ms
 (7 rows)

 database=# EXPLAIN ANALYZE SELECT COUNT(*) FROM schema.table;
                                                                QUERY PLAN
 
----------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=593404.99..593405.00 rows=1 width=0) (actual
 time=19139.072..19139.073 rows=1 loops=1)
    ->  Seq Scan on table  (cost=0.00..522350.19 rows=28421919 width=0)
 (actual time=5.683..17387.798 rows=28176350 loops=1)
  Total runtime: 19139.108 ms
 (3 rows)
 }}}

 Should we escalate this to release blocker?

--
Ticket URL: <https://code.djangoproject.com/ticket/25230#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/064.19fddfb7883e51181795349b30e2b854%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Django] #25230: Change to Query.get_count() causes big performance hit

Reply via email to