#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-------------------------------------+-------------------------------------
Reporter: Robert Leach | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: sql, distinct, | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Robert Leach):
Replying to [comment:6 Mariusz Felisiak]:
> This would be another logic that's implicit and probably unexpected by
users (at least in some cases). As far as I'm aware it's preferable to
fail loudly even with a `ProgrammingError`. I'm not sure how to improve
this note, maybe it's enough to add a correct example:
>
> {{{#!diff
> diff --git a/docs/ref/models/querysets.txt
b/docs/ref/models/querysets.txt
> index a9da1dcf7e..891b8255b0 100644
> --- a/docs/ref/models/querysets.txt
> +++ b/docs/ref/models/querysets.txt
> @@ -565,7 +565,9 @@ Examples (those after the first will only work on
PostgreSQL)::
> ...wouldn't work because the query would be ordered by
``blog__name`` thus
> mismatching the ``DISTINCT ON`` expression. You'd have to
explicitly order
> by the relation ``_id`` field (``blog_id`` in this case) or the
referenced
> - one (``blog__pk``) to make sure both expressions match.
> + one (``blog__pk``) to make sure both expressions match::
> +
> + Entry.objects.order_by('blog_id').distinct('blog_id')
>
> ``values()``
> ~~~~~~~~~~~~
> }}}
I'm not sure that would have been enough for me to have anticipated or
correctly interpreted the error I encountered. I just corrected our code
base and I now feel I have a clearer picture of how I got here... Let me
explain why. I'd thought my expressions did match:
{{{
distinct_fields = ['name', 'pk',
'compounds__testsynonyms__compound', 'compounds__testsynonyms__name',
'compounds__testsynonyms__pk', 'compounds__name', 'compounds__pk']
qs =
TestPeak.objects.filter(q_exp).order_by(*distinct_fields).distinct(*distinct_fields)
}}}
I was like "but I'm supplying the exact same fields for both order_by and
distinct".
What I think I'd missed was that "referenced field" in the phrase
`explicitly order by the relation _id or referenced field` doesn't just
apply to related model fields in the queried("/root") model (e.g.
`Entry`'s fields such as `blog` in
`Entry.objects.order_by('blog').distinct('blog')`, but applies to all
related models through all my joins - **and** (indirectly/potentially)
their `meta.ordering` values... because I wasn't explicitly adding the
`compounds__testsynonyms__compound` "field" above in my codebase where I
ran into this error - I was implicitly adding them -> I had written code
to straightforwardly grab and prepend the meta.ordering fields in order to
satisfy distinct's requirements and still have things ordered nicely - and
that's where I was running into this problem, because it didn't occur to
me that the fields in meta.ordering could be changed differently by
`order_by` and `distinct`. meta.ordering lets you add those "blog" fields
and I wasn't the developer that had added them to our code base, but I
didn't even think to check whether those fields I was adding would be
subject to the distinct/order_by gotcha.
I mean - all the information is there in the doc for me to have avoided
this. You just have to piece together doc info on `meta.ordering`,
`order_by`, and `distinct` and extend the examples to realize it applies
to joined models' fields - not just the root model. It was just a bit too
complex for me at my level of django experience to anticipate correctly
the implications of what I was doing.
**Maybe the best thing to do would be to anticipate the motivations that
lead me down this path...** I wanted to use distinct on a join query,
which meant that I needed to add fields to order_by that I didn't really
care about - I just needed to add them to meet the requirement that the
fields matched. BUT - I didn't want to change the desired ordering - so I
needed to explicitly add the default orderings so that the IDs wouldn't
override them. So maybe the doc should just point out that if you're
adding fields to order_by solely to be able to use distinct, but you don't
want to change the default ordering specified in meta.ordering - you need
to consider the fact that the meta.ordering fields explicitly added can
make you run into the order_by/distinct gotcha.
--
Ticket URL: <https://code.djangoproject.com/ticket/33682#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070180aec6ef98-b776dbb2-2681-4b7d-b985-e6eb9cca7985-000000%40eu-central-1.amazonses.com.