#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Keywords: distinct order_by
Severity: Normal | subquery
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
When I try to order a queryset that uses {{{distinct(...)}}}, e.g.:
{{{
#!python
qs2 = qs.order_by('a', 'b').distinct('a')
qs3 = qs2.order_by('c')
}}}
then Django seems to forget about the first {{{order_by}}} and apply the
second {{{order_by}}} before applying {{{distinct}}}. I would expect the
second {{{order_by}}} to be applied after applying {{{distinct}}}.
This is particularly troublesome when one wants to create a custom manager
on top of {{{qs2}}}.
I don't know if this is the intended behavior, but it is very unexpected
to me (see the section "Rationale").
=== Steps to reproduce ===
Use the Postgre SQL backend.
In models.py:
{{{
#!python
from django.db import models
class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
return super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')
class Human(models.Model):
nick = models.CharField(max_length=20)
name = models.CharField(max_length=20)
age = models.IntegerField()
objects = models.Manager()
youngest_among_namesakes = YoungestAmongNamesakes()
}}}
In the interactive shell:
{{{
#!python
Human.objects.create(nick='foo', name='Helen', age=20)
Human.objects.create(nick='bar', name='Helen', age=23)
Human.objects.create(nick='baz', name='Jennifer', age=15)
for human in Human.youngest_among_namesakes.all():
print human.nick
for human in Human.youngest_among_namesakes.order_by('name', 'nick'):
print human.nick
for human in Human.youngest_among_namesakes.order_by('nick'):
print human.nick
}}}
=== Actual results ===
The first iteration outputs:
{{{
foo
baz
}}}
The second iteration outputs:
{{{
bar
baz
}}}
The third iteration raises
{{{
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
}}}
{{{Human.youngest_among_namesakes.all()}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC
}}}
{{{Human.youngest_among_namesakes.order_by('name', 'nick')}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}
Similarly, {{{Human.youngest_among_namesakes.order_by('nick')}}}
generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."nick" ASC
}}}
=== Expected results ===
I would expect the second iteration to output:
{{{
foo
baz
}}}
I would expect the third iteration to output:
{{{
baz
foo
}}}
I would expect {{{Human.youngest_among_namesakes.order_by('name',
'nick')}}} to generate the following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}
Similarly, I would expect
{{{Human.youngest_among_namesakes.order_by('nick')}}} to generate the
following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."nick" ASC
}}}
=== Rationale ===
I would expect any queryset {{{qs}}} to contain the same objects as
{{{qs.order_by(...)}}}, just in a possibly different order.
=== Workaround ===
As a (suboptimal) workaround, one might use a subquery instead:
{{{
#!python
class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
qs = super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')
return super(YoungestAmongNamesakes,
self).get_queryset().filter(pk__in=qs)
}}}
This however generates unnecessarily complex SQL query for
{{{Human.youngest_among_namesakes.all()}}}.
=== Suggestion ===
I would suggest to rewrite {{{django.db.models.query.QuerySet.order_by}}}
not to unconditionally clear the previous ordering, but to first check
whether {{{distinct(...)}}} is used in the present queryset and use a
subquery in such a case.
Something like this (the code is untested and written quickly just to
document my thought):
{{{
#!python
def order_by(self, *field_names):
"""
Returns a new QuerySet instance with the ordering changed.
"""
assert self.query.can_filter(), \
"Cannot reorder a query once a slice has been taken."
obj = self._clone()
if self.query.distinct and self.query.distinct_fields:
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
obj = AggregateQuery(obj.model)
try:
obj.add_subquery(subquery, using=self.db)
except EmptyResultSet:
obj = subquery
else:
obj.query.clear_ordering(force_empty=False)
obj.query.add_ordering(*field_names)
return obj
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/049.f697a20af0f8a3453a19a38ad568ae92%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.