#30685: Suboptimal QuerySet.distinct().count()
-------------------------------------+-------------------------------------
               Reporter:  adamsol    |          Owner:  nobody
                   Type:             |         Status:  new
  Cleanup/optimization               |
              Component:  Database   |        Version:  2.2
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I have a PostgreSQL table with 100 000 records and 15 columns.

 A simple `.count()` query results in a fast SQL (execution time in seconds
 on the left):

 {{{(0.015) SELECT COUNT(*) AS "__count" FROM "table";}}}

 When we add `.distinct()`, a subquery is created with all columns
 SELECTed:

 {{{(0.178) SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS Col1, ...
 (15 columns) FROM "table") subquery;}}}

 When instead of `.distinct()` we write `.distinct('id')` and add
 `.order_by('id', 'col1', 'col2')`, the subquery is additionally ORDERed:

 {{{(0.151) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("table"."id")
 "table"."id" AS Col1, ... (15 columns) FROM "table" ORDER BY "table"."id"
 ASC, "table"."col1" ASC, "table"."col2" ASC) subquery;}}}

 Funny thing is that without `.distinct('id')` we can write
 `.order_by('non_existing_column')` and it works without any exception.

 After adding `.values('id')` and an empty `.order_by()`, the query is as
 fast as it can be with DISTINCT:

 {{{(0.053) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("table"."id")
 "table"."id" AS Col1 FROM "table") subquery;}}}

 I think that the subquery for `count()` should never contain additional
 columns nor be ordered (and the same probably goes for other
 aggregations).

-- 
Ticket URL: <https://code.djangoproject.com/ticket/30685>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/050.e28f820276fe08ecb1d5fb8de4796627%40djangoproject.com.

Reply via email to