#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-------------------------------------+-------------------------------------
     Reporter:  Robert Leach         |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  sql, distinct,       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Robert Leach):

 Replying to [comment:4 Mariusz Felisiak]:
 > Robert, Can you propose a documentation improvement via GitHub's PR?

 I can certainly give it a shot, though I'm not the best writer when it
 comes to brevity.

 Also, I don't have a deep understanding of the related Django code, so my
 understanding could be empirically correct, but technically flawed (like
 Bohr's model of the atom).  For example, when the same field reference is
 supplied to both `.order_by()` and `.distinct()`, such as in Simon's
 example:

 {{{
 TestSynonym.objects.distinct('compound').order_by('compound')
 }}}

 ...why is the inserted field in each case not coordinated?  Why does the
 conversion from the reference (`compound`) differ?  Simon says it resolves
 to:

 {{{
 list(TestSynonym.objects.distinct('compound').order_by('compound__name'))
 }}}

 but based on my debug output of another test using that above call, that's
 imprecise.  It shows:

 {{{
 QUERY: SELECT DISTINCT ON ("DataRepo_testsynonym"."compound_id")
 "DataRepo_testsynonym"."name", "DataRepo_testsynonym"."compound_id" FROM
 "DataRepo_testsynonym" INNER JOIN "DataRepo_testcompound" ON
 ("DataRepo_testsynonym"."compound_id" = "DataRepo_testcompound"."id")
 ORDER BY "DataRepo_testcompound"."name" ASC
 }}}

 which means that the distinct field resolution and order by field
 resolutions are:

 - `distinct`: `compound_id`
 - `order_by`: `name`

 When those methods are assessed individually, I understand why those
 fields are the preferred solution (e.g. the meta ordering may not be
 unique), but given that `distinct` requires the same fields be present at
 the beginning of the order-by, I don't know what prevents the code to be
 written to have those fields be resolved in a way that is copacetic.
 Like, why not convert the reference into 2 additional fields that
 together, meet both requirements (`name` AND `compound_id`)?  Order-by
 would be satisfied and distinct would be satisfied.  Or... in my case,
 `name` is unique, so distinct could resolve to the meta ordering without
 issue...

 Is there a technical reason the code doesn't already do this?

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33682#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180a9e08284-b3b04ee2-2650-4ce8-86a1-e633fe78db55-000000%40eu-central-1.amazonses.com.

Reply via email to