[Django] #34750: Group by removal optimization generates in correct count query

Django Fri, 28 Jul 2023 08:57:21 -0700

#34750: Group by removal optimization generates in correct count query
-----------------------------------------+------------------------
               Reporter:  Toan Vuong     |          Owner:  nobody
                   Type:  Uncategorized  |         Status:  new
              Component:  Uncategorized  |        Version:  4.2
               Severity:  Normal         |       Keywords:
           Triage Stage:  Unreviewed     |      Has patch:  0
    Needs documentation:  0              |    Needs tests:  0
Patch needs improvement:  0              |  Easy pickings:  0
                  UI/UX:  0              |
-----------------------------------------+------------------------
 I believe this is related to the work in
 https://code.djangoproject.com/ticket/28477 (and follow-up issues/changes
 due to that ticket).


 Tested on:
 Django 4.2.2
 OS X 13.4.1
 Python 3.9.16

 For the Oracle backend:
 cx-Oracle 8.3.0 with instantclient 19.8

 For the Postgres backend:
 psycopg/psycopg-binary 3.1.9


 Attached is a sample project, and the relevant query is below:

 {{{
 def populate_data():
     q1 = Question(question_text='t1')
     q1.save()

     for i in range(10):
         c = Choice(question=q1, choice_text='c1', votes=i)
         c.save()

 # Need to populate the data only once, so comment it out on subsequent
 runs
 populate_data()

 qs = (
         Question.objects.values('id', 'question_text')
             .annotate(mysum=Sum('choice__votes'))
             .annotate(choice__votes_threshold=Case(
                 When(choice__votes__gt=1, then=Value(1000)),
                 default=Value(-1)))
 )
 print(qs.count() == len(qs))

 }}}


 When issuing the count query (`qs.count()`), Django generates this:

 {{{
 SELECT COUNT(*) FROM (SELECT "polls_question"."id" AS "col1",
 "polls_question"."question_text" AS "col2" FROM "polls_question" LEFT
 OUTER JOIN "polls_choice" ON ("polls_question"."id" =
 "polls_choice"."question_id") GROUP BY 1
 }}}

 I've chased it down to
 [https://github.com/django/django/blob/main/django/db/models/sql/query.py#L490
 this optimization]. This count would return 1 because it's just a simple
 join.

 However, the actual query is this:

 {{{
 SELECT "polls_question"."id", "polls_question"."question_text",
 SUM("polls_choice"."votes") AS "mysum", CASE WHEN "polls_choice"."votes" >
 1 THEN 1000 ELSE  -1 END AS "choice__votes_threshold" FROM
 "polls_question" LEFT OUTER JOIN "polls_choice" ON ("polls_question"."id"
 = "polls_choice"."question_id") GROUP BY "polls_question"."id", 4
 }}}

 Due to the group by and the varying `choice__votes_threshold`, there
 should be 2 rows in the result set.

 This _did_ used to work in 3.2.18, and I didn't dig too much into it but I
 suspect the optimization was introduced after that version, hence why it
 worked.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34750>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/010701899d37ec2d-69d77806-5755-43d5-b899-580452e7994d-000000%40eu-central-1.amazonses.com.

[Django] #34750: Group by removal optimization generates in correct count query

Reply via email to