#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-------------------------------------+-------------------------------------
     Reporter:  Robert Leach         |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  sql, distinct,       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Robert Leach):

 Replying to [comment:6 Mariusz Felisiak]:
 > This would be another logic that's implicit and probably unexpected by
 users (at least in some cases). As far as I'm aware it's preferable to
 fail loudly even with a `ProgrammingError`. I'm not sure how to improve
 this note, maybe it's enough to add a correct example:
 >
 > {{{#!diff
 > diff --git a/docs/ref/models/querysets.txt
 b/docs/ref/models/querysets.txt
 > index a9da1dcf7e..891b8255b0 100644
 > --- a/docs/ref/models/querysets.txt
 > +++ b/docs/ref/models/querysets.txt
 > @@ -565,7 +565,9 @@ Examples (those after the first will only work on
 PostgreSQL)::
 >      ...wouldn't work because the query would be ordered by
 ``blog__name`` thus
 >      mismatching the ``DISTINCT ON`` expression. You'd have to
 explicitly order
 >      by the relation ``_id`` field (``blog_id`` in this case) or the
 referenced
 > -    one (``blog__pk``) to make sure both expressions match.
 > +    one (``blog__pk``) to make sure both expressions match::
 > +
 > +        Entry.objects.order_by('blog_id').distinct('blog_id')
 >
 >  ``values()``
 >  ~~~~~~~~~~~~
 > }}}

 I'm not sure that would have been enough for me to have anticipated or
 correctly interpreted the error I encountered.  I just corrected our code
 base and I now feel I have a clearer picture of how I got here...  Let me
 explain why.  I'd thought my expressions did match:

 {{{
         distinct_fields = ['name', 'pk',
 'compounds__testsynonyms__compound', 'compounds__testsynonyms__name',
 'compounds__testsynonyms__pk', 'compounds__name', 'compounds__pk']
         qs =
 
TestPeak.objects.filter(q_exp).order_by(*distinct_fields).distinct(*distinct_fields)
 }}}

 I was like "but I'm supplying the exact same fields for both order_by and
 distinct".

 What I think I'd missed was that "referenced field" in the phrase
 `explicitly order by the relation _id or referenced field` doesn't just
 apply to related model fields in the queried("/root") model (e.g.
 `Entry`'s fields such as `blog` in
 `Entry.objects.order_by('blog').distinct('blog')`, but applies to all
 related models through all my joins - **and** (indirectly/potentially)
 their `meta.ordering` values... because I wasn't explicitly adding the
 `compounds__testsynonyms__compound` "field" above in my codebase where I
 ran into this error - I was implicitly adding them -> I had written code
 to straightforwardly grab and prepend the meta.ordering fields in order to
 satisfy distinct's requirements and still have things ordered nicely - and
 that's where I was running into this problem, because it didn't occur to
 me that the fields in meta.ordering could be changed differently by
 `order_by` and `distinct`.  meta.ordering lets you add those "blog" fields
 and I wasn't the developer that had added them to our code base, but I
 didn't even think to check whether those fields I was adding would be
 subject to the distinct/order_by gotcha.

 I mean - all the information is there in the doc for me to have avoided
 this.  You just have to piece together doc info on `meta.ordering`,
 `order_by`, and `distinct` and extend the examples to realize it applies
 to joined models' fields - not just the root model.  It was just a bit too
 complex for me at my level of django experience to anticipate correctly
 the implications of what I was doing.

 **Maybe the best thing to do would be to anticipate the motivations that
 lead me down this path...** I wanted to use distinct on a join query,
 which meant that I needed to add fields to order_by that I didn't really
 care about - I just needed to add them to meet the requirement that the
 fields matched.  BUT - I didn't want to change the desired ordering - so I
 needed to explicitly add the default orderings so that the IDs wouldn't
 override them.  So maybe the doc should just point out that if you're
 adding fields to order_by solely to be able to use distinct, but you don't
 want to change the default ordering specified in meta.ordering - you need
 to consider the fact that the meta.ordering fields explicitly added can
 make you run into the order_by/distinct gotcha.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33682#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180aec6ef98-b776dbb2-2681-4b7d-b985-e6eb9cca7985-000000%40eu-central-1.amazonses.com.

Reply via email to