#6422: Support for 'DISTINCT ON' queries with QuerySet.distinct()
-------------------------------------+-------------------------------------
Reporter: Manfred Wassmann | Owner: jgelens
<manolo@…> | Status: assigned
Type: New feature | Version: SVN
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: dceu2011 | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Comment (by akaariai):
Assume you have models:
{{{
class A1(models.Model):
foo = models.TextField()
class B1(models.Model):
a = models.ForeignKey(A1)
}}}
And you do:
{{{
qs1 = B1.objects.distinct('a__id', 'a__foo')
print qs1.query
qs2 = qs1.distinct('pk')
print qs2.query
}}}
Now you will have a join to A even though you don't have a reference to
it. The queries generated:
{{{
SELECT DISTINCT ON ("obj_creation_speed_a1"."id",
"obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id",
"obj_creation_speed_b1"."a_id" FROM "obj_creation_speed_b1" INNER JOIN
"obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" =
"obj_creation_speed_a1"."id")
SELECT DISTINCT ON ("obj_creation_speed_b1"."id")
"obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id" FROM
"obj_creation_speed_b1" INNER JOIN "obj_creation_speed_a1" ON
("obj_creation_speed_b1"."a_id" = "obj_creation_speed_a1"."id")
}}}
Note that in the second query, there is a join into A even if no field of
A is used in the query. The join should be removed. But I don't think this
is a blocker issue. If you want to fix this, there are at least these two
choices:
- Apply the distinct on in the compiler when the query is generated,
order by is handle this way IIRC.
- Keep record of which aliases are referenced by the DISTINCT ON added
fields, and subtract the refcount by one for those aliases if the distinct
on is changed.
As said before, in my opinion this is not a must-fix in this ticket.
Now, if you continue from the above created qs1 and issue
.annotate(Max('a__id')), you will get this query:
{{{
qs3 = qs1.annotate(Max('a__id'))
print qs3.query
list(qs3)
}}}
The query generated is this:
{{{
SELECT DISTINCT ON ("obj_creation_speed_a1"."id",
"obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id",
"obj_creation_speed_b1"."a_id", MAX("obj_creation_speed_b1"."a_id") AS
"a__id__max" FROM "obj_creation_speed_b1" INNER JOIN
"obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" =
"obj_creation_speed_a1"."id") GROUP BY "obj_creation_speed_b1"."id",
"obj_creation_speed_b1"."a_id"
}}}
And the list(qs3) will generate this:
{{{
django.db.utils.DatabaseError: column "obj_creation_speed_a1.id" must
appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT DISTINCT ON ("obj_creation_speed_a1"."id", "obj_creat...
^
}}}
You could "fix" this by appending the DISTINCT ON columns into the group
by clause. Unfortunately (without going into details) the results aren't
what the user would expect. If you want the results the user expects, you
would need to do a subquery. If the distinct on is in the inner query or
group by is in the inner query depends on the order in which the .annotate
and .distinct are applied. A good fix for this would be to raise
NotImplementedError("Can't handle queries with aggregates and distinct on
clause"). I have written some SQL by hand, and can't remember ever using
distinct on and group by in the same query. Maybe there is some use case,
but I just don't see the need to block on this issue, either.
--
Ticket URL: <https://code.djangoproject.com/ticket/6422#comment:59>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.