#25789: Inefficient Queries Generated due to not using WHERE EXISTS ----------------------------------------------+-------------------- Reporter: cancan101 | Owner: nobody Type: Uncategorized | Status: new Component: Database layer (models, ORM) | Version: 1.8 Severity: Normal | Keywords: Triage Stage: Unreviewed | Has patch: 0 Easy pickings: 0 | UI/UX: 0 ----------------------------------------------+-------------------- [http://stackoverflow.com/questions/33764737/django-equivalent-of- sqlalchemy-any-to-filter-where-exists/33765323 Reposting question from SO] with some more details.
I believe that the Django ORM is generating seriously inefficient SQL due to it not using `WHERE EXISTS` but instead using a `DISTINCT` with a `LEFT JOIN`. by comparison, SQLAlchemy will use `WHERE EXISTS`. I have two models, `Exam` and `Series`. `Series` objects have a foreign key to an `Exam` object. Both of the models contain a field `description_user`. I am trying to search for all `Exam`s that have a search term in `description_user` or have a child `Series` with that term in its `description_user`. I want to do this for a number of search terms (requiring all of them). I also want to de-duplicate the results (ie not get the same Exam multiple times). This is roughly what the filter looks like: {{{ a = (Q(**{'series__description_user__icontains': 'bar'}) | Q(**{'description_user__icontains': 'bar'})) b = (Q(**{'series__description_user__icontains': 'foo'}) | Q(**{'description_user__icontains': 'foo'})) c = (Q(**{'series__description_user__icontains': 'baz'}) | Q(**{'description_user__icontains': 'baz'})) Exam.objects.filter(a & b & c).distinct() }}} with corresponding SQL: {{{ SELECT DISTINCT "exam_storage_exam"."id", "exam_storage_exam"."description_user" FROM "exam_storage_exam" LEFT OUTER JOIN "exam_storage_series" ON ( "exam_storage_exam"."id" = "exam_storage_series"."exam_id" AND ( "exam_storage_series"."removed" IS NULL) ) WHERE ( "exam_storage_exam"."removed" IS NULL AND ( "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\' OR "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\') AND ( "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\' OR "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\') AND ( "exam_storage_series"."description_user" LIKE %s ESCAPE \'\\\' OR "exam_storage_exam"."description_user" LIKE %s ESCAPE \'\\\')) }}} The issue is that as the number of search terms grows, the size of the intermediate data set before the DISTINCT operation grows as well. Ideally the SQL would look like: {{{ SELECT * FROM exam WHERE (EXISTS (SELECT 1 FROM exam_storage_series WHERE exam.id = series.exam_id AND ( series.description_user LIKE '%foo%' )) or exam.description_user LIKE '%foo%') AND (EXISTS (SELECT 1 FROM exam_storage_series WHERE exam.id = series.exam_id AND ( series.description_user LIKE '%bar%' )) or exam.description_user LIKE '%bar%') AND (EXISTS (SELECT 1 FROM exam_storage_series WHERE exam.id = series.exam_id AND ( series.description_user LIKE '%baz%' )) or exam.description_user LIKE '%baz%') }}} Currently the performance of Django query is terrible. This style searching comes up for example in how [https://github.com/tomchristie /django-rest- framework/blob/43c45cc9391ec2bed9481a8b309990dec35b6ac8/rest_framework/filters.py#L132-L180 DRF generates search queries]. -- Ticket URL: <https://code.djangoproject.com/ticket/25789> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/052.deaca43690dd267f8c7b080d5fe53416%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.