#24218: Use sub-query in ORM when distinct and order_by columns do not match
-------------------------------------+-------------------------------------
     Reporter:  miki725              |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  subquery distinct    |             Triage Stage:  Accepted
  order_by                           |
    Has patch:  1                    |      Needs documentation:  1
  Needs tests:  1                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by miki725):

 Thank you for such quick responses!

   This business of unpredictable results is what makes me most
 uncomfortable with the patch.

 I completely agree with that statement. Let me explain my-use case and
 then maybe possible solutions.

 {{{#!python
 class ModelFoo(models.Model):
     bars = models.ManyToManyField(ModelBar, related_name='foos')

 class ModelBar(models.Model):
     latlon = models.PointField(spatial_index=True)

 # query foos where they have related bars within certain distance
 # and sort by increasing distance
 (ModelFoo.objects
     # filter by distance
     .filter(bars__latlon__distance_lte=(point, Distance(mi=1)))
     # add distance annotation
     .distance(point, field_name='bars__latlon')
     # sort by distance
     .extra(order_by=['distance'])
     # distinct foos
     .distinct('id'))
 }}}

 The reason I needed distinct here is because otherwise whenever foo had
 multiple bars, I would get duplicate foo models due to the join. Since in
 my case I simply used bars for filtering foos, I didnt care about
 "unpredictable" order at all since I only cared about foos. As for
 sorting, initially I also thought that {{{ORDER BY}}} should be inside the
 inner query and {{{DISTINCT}}} on the outer query however for some reason
 when I did that, foos were not sorted by distance anymore.

 This however is just my use-case and I do see how this can cause issues in
 other scenarios since you cannot be sure what comes first in the
 {{{DISTINCT}}}.

 Here are some possible solutions:

 1. It seems that whenever a pattern of many-guys is used to filter single
 guys on one/many-to-many and sorting is required, this pattern might be
 useful. So maybe this functionality should only be enabled in those cases.

 2. What about if this behavior will be configurable? More specifically
 what about if the user will be able to control:

 * explicitly enable this functionary since otherwise Django cannot
 guarantee data integrity for all use-cases. This will force the user to
 explicitly acknowledge they want to do this hence eliminating the burden
 from Django to always provide data-integrity (and explain all the cases
 when it is not provided).
 * what goes to inner and outer subquery ({{{ORDER BY}}} or
 {{{DISTINCT}}}). Maybe even allow {{{ORDER BY}}} in both queries. So the
 user will pick some sort columns for both inner and outer query hence
 solving "predictability" of {{{DISTINCT}}} and allowing to sort overall
 results.

 I do like the idea of making these things customizable however not sure
 what would be an API to do those customizations. Is there precedent like
 that in Django ORM for query customizations?

 3. Perhaps the right solution is try not to solve my particular use-case
 but provide a more generic solution to use nested queries. Maybe use API
 something like:

 {{{#!python
 Model.objects.nested(
     Model.objects.filter(...).distinct(...).order_by(...)  # inner
 ).order_by(...)  # outer
 }}}

 Let me know your thoughts.

--
Ticket URL: <https://code.djangoproject.com/ticket/24218#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.7bf7c62086128dcf5e15e2a63a608be7%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to