#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
     Reporter:  mcagl                |                    Owner:  nobody
         Type:  Uncategorized        |                   Status:  new
    Component:  Database layer       |                  Version:  1.7
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by mcagl):

 It seems similar, or at least somewhat related.

 I did a little more inspecting, doing this in {{{./manage.py shell}}} to
 obtain the executed raw SQL statements:

 {{{
 import logging
 l = logging.getLogger('django.db.backends')
 l.setLevel(logging.DEBUG)
 l.addHandler(logging.StreamHandler())
 import datetime
 from decimal import Decimal
 import my_app.models as app_models
 t0 = app_models.Tag.objects.create(name="Tag0")
 t1 = app_models.Tag.objects.create(name="Tag1")
 t2 = app_models.Tag.objects.create(name="Tag2")
 t3 = app_models.Tag.objects.create(name="Tag3")
 app_models.Row.objects.create(day=datetime.date(2014, 1, 1),
 amount=Decimal('10.00'))
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2),
 amount=Decimal('10.00'))
 r.tags.add(t0)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3),
 amount=Decimal('10.00'))
 r.tags.add(t1)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4),
 amount=Decimal('10.00'))
 r.tags.add(t0, t1)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5),
 amount=Decimal('10.00'))
 r.tags.add(t2, t3)
 }}}

 so mainly adding a DEBUG level logger for SQL queries, and recreating the
 objects in the setUp() method of the TestCase in the github repository.
 Then I tried the queries, obtaining this output, first without
 {{{distinct()}}}:
 {{{
 rows = app_models.Row.objects.filter(tags__in=[t0, t1]
 list(rows)
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount",
 "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
 "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
 "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
 >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row
 object>]
 }}}

 {{{
 agg = rows.aggregate(Sum('amount'))
 print agg['amount__sum']
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum"
 FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" =
 "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)'
 - PARAMS = (1, 2); args=(1, 2)
 >>> 40.00
 }}}

 Then I added {{{distinct()}}}, which gets added as expected (and the rows
 queryset gets 3 objects instead of 4, also expected):

 {{{
 rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct()
 list(rows)
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id",
 "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN
 "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" )
 WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1,
 2)
 >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
 }}}

 but somewhat fails when doint aggregation in this version of the queryset,
 because the {{{DISTINCT}}} is added before {{{SUM}}}, whereas I expect
 that {{{distinct()}}} gets executed before doing the aggregation, as it's
 the python code I wrote:

 {{{
 agg = rows.aggregate(Sum('amount'))
 print agg['amount__sum']
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS
 "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
 "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
 "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
 >>> 40.00
 }}}

 And obviously {{{agg['amount__sum']}}} contains an unexpected value,
 because at an higher level it's like the Row object with two tags is
 counted twice, completely disregarding the {{{distinct()}}} method I
 called before aggregating.


 Thanks for the quick response, I hope that this can be useful,
 Mark

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/063.b522644ff0cac109cdf89b40581b1ae1%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to