#24218: Use sub-query in ORM when distinct and order_by columns do not match
-------------------------------------+-------------------------------------
Reporter: miki725 | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery distinct | Triage Stage: Accepted
order_by |
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by miki725):
Thank you for such quick responses!
This business of unpredictable results is what makes me most
uncomfortable with the patch.
I completely agree with that statement. Let me explain my-use case and
then maybe possible solutions.
{{{#!python
class ModelFoo(models.Model):
bars = models.ManyToManyField(ModelBar, related_name='foos')
class ModelBar(models.Model):
latlon = models.PointField(spatial_index=True)
# query foos where they have related bars within certain distance
# and sort by increasing distance
(ModelFoo.objects
# filter by distance
.filter(bars__latlon__distance_lte=(point, Distance(mi=1)))
# add distance annotation
.distance(point, field_name='bars__latlon')
# sort by distance
.extra(order_by=['distance'])
# distinct foos
.distinct('id'))
}}}
The reason I needed distinct here is because otherwise whenever foo had
multiple bars, I would get duplicate foo models due to the join. Since in
my case I simply used bars for filtering foos, I didnt care about
"unpredictable" order at all since I only cared about foos. As for
sorting, initially I also thought that {{{ORDER BY}}} should be inside the
inner query and {{{DISTINCT}}} on the outer query however for some reason
when I did that, foos were not sorted by distance anymore.
This however is just my use-case and I do see how this can cause issues in
other scenarios since you cannot be sure what comes first in the
{{{DISTINCT}}}.
Here are some possible solutions:
1. It seems that whenever a pattern of many-guys is used to filter single
guys on one/many-to-many and sorting is required, this pattern might be
useful. So maybe this functionality should only be enabled in those cases.
2. What about if this behavior will be configurable? More specifically
what about if the user will be able to control:
* explicitly enable this functionary since otherwise Django cannot
guarantee data integrity for all use-cases. This will force the user to
explicitly acknowledge they want to do this hence eliminating the burden
from Django to always provide data-integrity (and explain all the cases
when it is not provided).
* what goes to inner and outer subquery ({{{ORDER BY}}} or
{{{DISTINCT}}}). Maybe even allow {{{ORDER BY}}} in both queries. So the
user will pick some sort columns for both inner and outer query hence
solving "predictability" of {{{DISTINCT}}} and allowing to sort overall
results.
I do like the idea of making these things customizable however not sure
what would be an API to do those customizations. Is there precedent like
that in Django ORM for query customizations?
3. Perhaps the right solution is try not to solve my particular use-case
but provide a more generic solution to use nested queries. Maybe use API
something like:
{{{#!python
Model.objects.nested(
Model.objects.filter(...).distinct(...).order_by(...) # inner
).order_by(...) # outer
}}}
Let me know your thoughts.
--
Ticket URL: <https://code.djangoproject.com/ticket/24218#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/065.7bf7c62086128dcf5e15e2a63a608be7%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.