#16025: distinct does not apply to aggregated querysets
------------------------------------+---------------------------------
               Reporter:  micolous  |          Owner:  nobody
                   Type:  Bug       |         Status:  new
              Milestone:            |      Component:  ORM aggregation
                Version:  1.3       |       Severity:  Normal
             Resolution:            |       Keywords:
           Triage Stage:  Accepted  |      Has patch:  0
    Needs documentation:  0         |    Needs tests:  0
Patch needs improvement:  0         |  Easy pickings:  0
------------------------------------+---------------------------------
Description changed by ramiro:

Old description:

> When applying .distinct() to a QuerySet, performing an aggregation using
> .aggregate() on that will only ensure that the aggregation result is
> distinct.  As a result, this prevents you from doing a query where you
> want the input to the aggregation to have distinct rows.
>
> Take the example models:
>
> {{{
> class Category(Model):
>   label = CharField(max_length=128)
>
> class Food(Model):
>   name = CharField(max_length=128)
>   categories = ManyToManyField(Category)
>   kilojoules_per_kg = PositiveIntegerField()
> }}}
>
> If you then did a query on which you selected a couple of Categories as
> foreign keys (eg: "Pizza Toppings" and "Fruits"), you'd end up with a
> list of Foods that would contain duplicates where a food item fits in
> multiple categories (eg: "Apple", "Orange", "Pineapple", "Pineapple",
> "Bacon" and "Cheese" ...).
>
> If you then wanted to then find the average number of kilojoules per
> kilogram for items in those groups and eliminate duplicates (Pineapple),
> the documentation would lead you to do this:
>
> {{{
> qs.distinct().aggregate(Average('kilojoules_per_kg'))
> }}}
>
> However instead of counting Pineapple once, Pineapple would be counted
> twice, as the distinct() call would only apply to the result of the
> aggregation, not to the input of it.
>
> {{{
> >>> qs.distinct().aggregate(Average('kilojoules_per_kg')) ==
> qs.aggregate(Average('kilojoules_per_kg'))
> True
> }}}
>
> Instead, .distinct() before .aggregate() should instead cause a subquery
> to be executed to filter the results before passing it to the aggregation
> functions.  There doesn't seem a way to presently execute a query like
> this using the Django ORM.
>
> .distinct() after an aggregation should cause the results of the
> aggregation to be filtered so only distinct records are returned (which
> is the current behavior).
>
> I've managed to confirm this behavior in Django 1.2.3 and 1.3.

New description:

 When applying .distinct() to a QuerySet, performing an aggregation using
 .aggregate() on that will only ensure that the aggregation result is
 distinct.  As a result, this prevents you from doing a query where you
 want the input to the aggregation to have distinct rows.

 Take the example models:

 {{{
 class Category(Model):
   label = CharField(max_length=128)

 class Food(Model):
   name = CharField(max_length=128)
   categories = ManyToManyField(Category)
   kilojoules_per_kg = PositiveIntegerField()
 }}}

 If you then did a query on which you selected a couple of Categories as
 foreign keys (eg: "Pizza Toppings" and "Fruits"), you'd end up with a list
 of Foods that would contain duplicates where a food item fits in multiple
 categories (eg: "Apple", "Orange", "Pineapple", "Pineapple", "Bacon" and
 "Cheese" ...).

 If you then wanted to then find the average number of kilojoules per
 kilogram for items in those groups and eliminate duplicates (Pineapple),
 the documentation would lead you to do this:

 {{{
 qs.distinct().aggregate(Avg('kilojoules_per_kg'))
 }}}

 However instead of counting Pineapple once, Pineapple would be counted
 twice, as the distinct() call would only apply to the result of the
 aggregation, not to the input of it.

 {{{
 >>> qs.distinct().aggregate(Avg('kilojoules_per_kg')) ==
 qs.aggregate(Avg('kilojoules_per_kg'))
 True
 }}}

 Instead, .distinct() before .aggregate() should instead cause a subquery
 to be executed to filter the results before passing it to the aggregation
 functions.  There doesn't seem a way to presently execute a query like
 this using the Django ORM.

 .distinct() after an aggregation should cause the results of the
 aggregation to be filtered so only distinct records are returned (which is
 the current behavior).

 I've managed to confirm this behavior in Django 1.2.3 and 1.3.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/16025#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to