#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-------------------------------------+-------------------------------------
Reporter: Robert Leach | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: sql, distinct, | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Robert Leach):
Replying to [comment:4 Mariusz Felisiak]:
> Robert, Can you propose a documentation improvement via GitHub's PR?
I can certainly give it a shot, though I'm not the best writer when it
comes to brevity.
Also, I don't have a deep understanding of the related Django code, so my
understanding could be empirically correct, but technically flawed (like
Bohr's model of the atom). For example, when the same field reference is
supplied to both `.order_by()` and `.distinct()`, such as in Simon's
example:
{{{
TestSynonym.objects.distinct('compound').order_by('compound')
}}}
...why is the inserted field in each case not coordinated? Why does the
conversion from the reference (`compound`) differ? Simon says it resolves
to:
{{{
list(TestSynonym.objects.distinct('compound').order_by('compound__name'))
}}}
but based on my debug output of another test using that above call, that's
imprecise. It shows:
{{{
QUERY: SELECT DISTINCT ON ("DataRepo_testsynonym"."compound_id")
"DataRepo_testsynonym"."name", "DataRepo_testsynonym"."compound_id" FROM
"DataRepo_testsynonym" INNER JOIN "DataRepo_testcompound" ON
("DataRepo_testsynonym"."compound_id" = "DataRepo_testcompound"."id")
ORDER BY "DataRepo_testcompound"."name" ASC
}}}
which means that the distinct field resolution and order by field
resolutions are:
- `distinct`: `compound_id`
- `order_by`: `name`
When those methods are assessed individually, I understand why those
fields are the preferred solution (e.g. the meta ordering may not be
unique), but given that `distinct` requires the same fields be present at
the beginning of the order-by, I don't know what prevents the code to be
written to have those fields be resolved in a way that is copacetic.
Like, why not convert the reference into 2 additional fields that
together, meet both requirements (`name` AND `compound_id`)? Order-by
would be satisfied and distinct would be satisfied. Or... in my case,
`name` is unique, so distinct could resolve to the meta ordering without
issue...
Is there a technical reason the code doesn't already do this?
--
Ticket URL: <https://code.djangoproject.com/ticket/33682#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070180a9e08284-b3b04ee2-2650-4ce8-86a1-e633fe78db55-000000%40eu-central-1.amazonses.com.