#25789: Inefficient Queries Generated due to not using WHERE EXISTS
----------------------------------------------+--------------------
     Reporter:  cancan101                     |      Owner:  nobody
         Type:  Uncategorized                 |     Status:  new
    Component:  Database layer (models, ORM)  |    Version:  1.8
     Severity:  Normal                        |   Keywords:
 Triage Stage:  Unreviewed                    |  Has patch:  0
Easy pickings:  0                             |      UI/UX:  0
----------------------------------------------+--------------------
 [http://stackoverflow.com/questions/33764737/django-equivalent-of-
 sqlalchemy-any-to-filter-where-exists/33765323 Reposting question from SO]
 with some more details.

 I believe that the Django ORM is generating seriously inefficient SQL due
 to it not using `WHERE EXISTS` but instead using a `DISTINCT` with a `LEFT
 JOIN`. by comparison, SQLAlchemy will use `WHERE EXISTS`.

 I have two models, `Exam` and `Series`. `Series` objects have a foreign
 key to an `Exam` object. Both of the models contain a field
 `description_user`. I am trying to search for all `Exam`s that have a
 search term in `description_user` or have a child `Series` with that term
 in its `description_user`. I want to do this for a number of search terms
 (requiring all of them). I also want to de-duplicate the results (ie not
 get the same Exam multiple times).

 This is roughly what the filter looks like:

 {{{
 a = (Q(**{'series__description_user__icontains': 'bar'}) |
 Q(**{'description_user__icontains': 'bar'}))
 b = (Q(**{'series__description_user__icontains': 'foo'}) |
 Q(**{'description_user__icontains': 'foo'}))
 c = (Q(**{'series__description_user__icontains': 'baz'}) |
 Q(**{'description_user__icontains': 'baz'}))
 Exam.objects.filter(a & b & c).distinct()
 }}}

 with corresponding SQL:

 {{{
 SELECT DISTINCT
                 "exam_storage_exam"."id",
                 "exam_storage_exam"."description_user"
 FROM            "exam_storage_exam"
 LEFT OUTER JOIN "exam_storage_series"
 ON              (
                                 "exam_storage_exam"."id" =
 "exam_storage_series"."exam_id"
                 AND             (
 "exam_storage_series"."removed" IS NULL) )
 WHERE           (
                                 "exam_storage_exam"."removed" IS NULL
                 AND             (
 "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
                                 OR
 "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\')
                 AND             (
 "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
                                 OR
 "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\')
                 AND             (
 "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\'
                                 OR
 "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\'))

 }}}

 The issue is that as the number of search terms grows, the size of the
 intermediate data set before the DISTINCT operation grows as well.

 Ideally the SQL would look like:
 {{{
 SELECT *
  FROM exam
    WHERE (EXISTS (SELECT 1
         FROM exam_storage_series
           WHERE exam.id = series.exam_id AND (
             series.description_user LIKE '%foo%'
           )) or exam.description_user LIKE '%foo%') AND
 (EXISTS (SELECT 1
         FROM exam_storage_series
           WHERE exam.id = series.exam_id AND (
             series.description_user LIKE '%bar%'
           )) or exam.description_user LIKE '%bar%') AND
 (EXISTS (SELECT 1
         FROM exam_storage_series
           WHERE exam.id = series.exam_id AND (
             series.description_user LIKE '%baz%'
           )) or exam.description_user LIKE '%baz%')
 }}}

 Currently the performance of Django query is terrible. This style
 searching comes up for example in how [https://github.com/tomchristie
 /django-rest-
 
framework/blob/43c45cc9391ec2bed9481a8b309990dec35b6ac8/rest_framework/filters.py#L132-L180
 DRF generates search queries].

--
Ticket URL: <https://code.djangoproject.com/ticket/25789>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.deaca43690dd267f8c7b080d5fe53416%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to