#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
     Reporter:  trosos               |      Owner:  nobody
         Type:  Uncategorized        |     Status:  new
    Component:  Database layer       |    Version:  1.7
  (models, ORM)                      |   Keywords:  distinct order_by
     Severity:  Normal               |  subquery
 Triage Stage:  Unreviewed           |  Has patch:  0
Easy pickings:  0                    |      UI/UX:  0
-------------------------------------+-------------------------------------
 When I try to order a queryset that uses {{{distinct(...)}}}, e.g.:
 {{{
 #!python
 qs2 = qs.order_by('a', 'b').distinct('a')
 qs3 = qs2.order_by('c')
 }}}
 then Django seems to forget about the first {{{order_by}}} and apply the
 second {{{order_by}}} before applying {{{distinct}}}. I would expect the
 second {{{order_by}}} to be applied after applying {{{distinct}}}.

 This is particularly troublesome when one wants to create a custom manager
 on top of {{{qs2}}}.

 I don't know if this is the intended behavior, but it is very unexpected
 to me (see the section "Rationale").

 === Steps to reproduce ===

 Use the Postgre SQL backend.

 In models.py:
 {{{
 #!python
 from django.db import models

 class YoungestAmongNamesakes(models.Manager):
     def get_queryset(self):
         return super(YoungestAmongNamesakes, self).get_queryset().\
           order_by('name', 'age').distinct('name')

 class Human(models.Model):
     nick = models.CharField(max_length=20)
     name = models.CharField(max_length=20)
     age = models.IntegerField()

     objects = models.Manager()
     youngest_among_namesakes = YoungestAmongNamesakes()
 }}}

 In the interactive shell:
 {{{
 #!python
 Human.objects.create(nick='foo', name='Helen', age=20)
 Human.objects.create(nick='bar', name='Helen', age=23)
 Human.objects.create(nick='baz', name='Jennifer', age=15)

 for human in Human.youngest_among_namesakes.all():
     print human.nick

 for human in Human.youngest_among_namesakes.order_by('name', 'nick'):
     print human.nick

 for human in Human.youngest_among_namesakes.order_by('nick'):
     print human.nick
 }}}

 === Actual results ===

 The first iteration outputs:
 {{{
 foo
 baz
 }}}

 The second iteration outputs:
 {{{
 bar
 baz
 }}}

 The third iteration raises
 {{{
 ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER
 BY expressions
 }}}

 {{{Human.youngest_among_namesakes.all()}}} generates:
 {{{
 SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
 "myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
 "myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC
 }}}

 {{{Human.youngest_among_namesakes.order_by('name', 'nick')}}} generates:
 {{{
 SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
 "myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
 "myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."nick" ASC
 }}}

 Similarly, {{{Human.youngest_among_namesakes.order_by('nick')}}}
 generates:
 {{{
 SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
 "myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
 "myapp_human" ORDER BY "myapp_human"."nick" ASC
 }}}

 === Expected results ===

 I would expect the second iteration to output:
 {{{
 foo
 baz
 }}}

 I would expect the third iteration to output:
 {{{
 baz
 foo
 }}}

 I would expect {{{Human.youngest_among_namesakes.order_by('name',
 'nick')}}} to generate the following query:
 {{{
 SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
 "myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
 DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
 ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
 "myapp_human"."name" ASC, "myapp_human"."nick" ASC
 }}}

 Similarly, I would expect
 {{{Human.youngest_among_namesakes.order_by('nick')}}} to generate the
 following query:
 {{{
 SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
 "myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
 DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
 ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
 "myapp_human"."nick" ASC
 }}}

 === Rationale ===

 I would expect any queryset {{{qs}}} to contain the same objects as
 {{{qs.order_by(...)}}}, just in a possibly different order.

 === Workaround ===

 As a (suboptimal) workaround, one might use a subquery instead:

 {{{
 #!python
 class YoungestAmongNamesakes(models.Manager):
     def get_queryset(self):
         qs = super(YoungestAmongNamesakes, self).get_queryset().\
           order_by('name', 'age').distinct('name')
         return super(YoungestAmongNamesakes,
 self).get_queryset().filter(pk__in=qs)
 }}}

 This however generates unnecessarily complex SQL query for
 {{{Human.youngest_among_namesakes.all()}}}.

 === Suggestion ===

 I would suggest to rewrite {{{django.db.models.query.QuerySet.order_by}}}
 not to unconditionally clear the previous ordering, but to first check
 whether {{{distinct(...)}}} is used in the present queryset and use a
 subquery in such a case.

 Something like this (the code is untested and written quickly just to
 document my thought):

 {{{
 #!python
     def order_by(self, *field_names):
         """
         Returns a new QuerySet instance with the ordering changed.
         """
         assert self.query.can_filter(), \
             "Cannot reorder a query once a slice has been taken."
         obj = self._clone()

         if self.query.distinct and self.query.distinct_fields:
             from django.db.models.sql.subqueries import AggregateQuery
             subquery = obj

             obj = AggregateQuery(obj.model)
             try:
                 obj.add_subquery(subquery, using=self.db)
             except EmptyResultSet:
                 obj = subquery
         else:
             obj.query.clear_ordering(force_empty=False)

         obj.query.add_ordering(*field_names)
         return obj
 }}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/049.f697a20af0f8a3453a19a38ad568ae92%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to