#6422: Support for 'DISTINCT ON' queries with QuerySet.distinct()
-------------------------------------+-------------------------------------
Reporter: Manfred Wassmann | Owner: jgelens
<manolo@…> | Status: assigned
Type: New feature | Version: SVN
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: dceu2011 | Patch needs improvement: 0
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):
* cc: anssi.kaariainen@… (added)
Comment:
Sorry, I still found one more problem, and some smaller issues:
- The distinct() in postgresql_psycopg2/operations.py doesn't take
aliases into account. It assumes that the distinct is always for the main
table alias. If you first do a filter on m2m relation, then a distinct on
for the m2m relation you get:
{{{
# I added coworkers = models.ManyToManyField('self') to Staff model used
in tests
>>> print
Staff.objects.distinct('coworkers__name').order_by('coworkers__name').query
SELECT DISTINCT ON ("queries_staff"."name")
"queries_staff"."id", "queries_staff"."name",
"queries_staff"."organisation", T3."name"
FROM "queries_staff"
LEFT OUTER JOIN "queries_staff_coworkers"
ON ("queries_staff"."id" = "queries_staff_coworkers"."from_staff_id")
LEFT OUTER JOIN "queries_staff" T3
ON ("queries_staff_coworkers"."to_staff_id" = T3."id")
ORDER BY T3."name" ASC
}}}
If you execute that query, it will error, because distinct on is for
"queries_staff"."name", but order by is for T3."name"
- In the attached patch there is a different approach of join cleanup.
It also fixes one other long-standing expected_failure test in queries.
The failure is related to similar "lingering joins" done by ordering.
- The `obj.query.distinct_fields = []` in query.distinct() is not
needed. The variable will be set anyways in
sql.query.add_distinct_fields(). obj.query.distinct = True can also be set
in add_distinct_fields()
- Should the joins be generated as OUTER or INNER joins? Not sure about
this. I guess either way will be fine. The attached patch generates OUTER
JOINS (because it uses similar code than the order by join creation).
- Prevent different distinct_fields lists when combining querysets.
I attached a patch which fixes these issues. It isn't the cleanest one
(non-DRY, code copy from order_by). But it can be used as a basis for a
cleaner patch if needed.
I did also check out how to support aggregates with distinct_fields.
However I had a hard time figuring out a realistic test case, so I removed
that part. So, jgelens is absolutely correct here: better to first find an
use case :)
All in all, the only major issue is the correct alias usage in
compiler.py. The rest is just nitpicking. In my opinion the alias usage
needs to be fixed before commit, but I may be over-complicating things
again... Other than that I think this feature is now very solid. I might
be fine to just let the distinct() generated joins linger in there after
all. That is what order_by has done for ages.
The attached patch passes all tests, with one long-standing expected
failure now gone. As said the patch is just to show one possible approach
to the alias issue. I hope jgelens will check it out and see what to do to
the above mentioned issues (if anything).
--
Ticket URL: <https://code.djangoproject.com/ticket/6422#comment:61>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.