#34597: Django ORM query SQL generation changed (and is apparently a lot slower)
-------------------------------------+-------------------------------------
     Reporter:  Lorand Varga         |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Lorand Varga):

 * status:  closed => new
 * resolution:  needsinfo =>


Comment:

 Adding more information:

 Django 2.2 behavior:

 {{{
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
     "test_app_blog"."is_published" = True
     AND "test_app_blog"."api_has_translation" = False
     AND NOT (
       "test_app_blog"."id" IN (
         SELECT
           U0."id"
         FROM
           "test_app_blog" U0
           LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
         WHERE
           (
             U1."id" IS NULL
             AND U0."id" = ("test_app_blog"."id")
           )
       )
     )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC

 Sort  (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual
 time=888.530..888.533 rows=4 loops=1)
   Sort Key: test_app_blog.updated_at DESC
   Sort Method: quicksort  Memory: 33kB
   ->  Index Scan using test_app_blog_is_published_4b47c652_uniq on
 test_app_blog  (cost=0.42..995257.82 rows=90293 width=1985) (actual
 time=72.414..888.496 rows=4 loops=1)
         Index Cond: (is_published = true)
         Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1)))
         Rows Removed by Filter: 191941
         SubPlan 1
           ->  Nested Loop Left Join  (cost=0.71..8.57 rows=1 width=4)
 (actual time=0.003..0.003 rows=1 loops=179902)
                 Join Filter: (u0.id = u1.blog_id)
                 Filter: (u1.id IS NULL)
                 Rows Removed by Filter: 0
                 ->  Index Only Scan using test_app_blog_pkey on
 test_app_blog u0  (cost=0.42..2.44 rows=1 width=4) (actual
 time=0.002..0.002 rows=1 loops=179902)
                       Index Cond: (id = test_app_blog.id)
                       Heap Fetches: 27765
                 ->  Index Scan using test_app_translation_51c6d5db on
 test_app_translation u1  (cost=0.29..6.07 rows=5 width=8) (actual
 time=0.001..0.001 rows=0 loops=179902)
                       Index Cond: (blog_id = test_app_blog.id)
 Planning Time: 0.500 ms
 Execution Time: 888.606 ms
 }}}

 Django 3.2 (and 4.2) behavior:
 {{{
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
     "test_app_blog"."is_published"
     AND NOT "test_app_blog"."api_has_translation"
     AND NOT (
       EXISTS(
         SELECT
           (1) AS "a"
         FROM
           "test_app_blog" U0
           LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
         WHERE
           (
             U1."id" IS NULL
             AND U0."id" = "test_app_blog"."id"
             AND "test_app_blog"."id" = "test_app_blog"."id"
           )
         LIMIT
           1
       )
     )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC

  Nested Loop Anti Join  (cost=1001.15..131927.02 rows=180586 width=1985)
 (actual time=208264.409..3034435.256 rows=4 loops=1)
    Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
 test_app_blog.id))
    Rows Removed by Join Filter: 16853800739
    ->  Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
 test_app_blog  (cost=0.42..107115.43 rows=180586 width=1985) (actual
 time=1.080..17454.750 rows=179902 loops=1)
          Filter: (is_published AND (NOT api_has_translation))
          Rows Removed by Filter: 27456
    ->  Materialize  (cost=1000.73..21651.35 rows=1 width=4) (actual
 time=0.002..9.009 rows=93684 loops=179902)
          ->  Gather Merge  (cost=1000.73..21651.34 rows=1 width=4) (actual
 time=6.024..192.937 rows=194102 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Merge Left Join  (cost=0.71..20651.20 rows=1 width=4)
 (actual time=0.046..87.573 rows=64701 loops=3)
                      Merge Cond: (u0.id = u1.blog_id)
                      Filter: (u1.id IS NULL)
                      Rows Removed by Filter: 19931
                      ->  Parallel Index Only Scan using test_app_blog_pkey
 on test_app_blog u0  (cost=0.42..13961.37 rows=86361 width=4) (actual
 time=0.018..32.796 rows=69119 loops=3)
                            Heap Fetches: 33004
                      ->  Index Scan using test_app_translation_51c6d5db on
 test_app_translation u1  (cost=0.29..6082.91 rows=60175 width=8) (actual
 time=0.025..32.500 rows=58545 loops=3)
  Planning Time: 0.882 ms
  Execution Time: 3034436.784 ms
 }}}

 As can be seen, there is a huge increase in execution time.
 I can't vouch there isn't something else going on here also but currently
 the only thing changed is the django version.
 Anything else in the environment is left untouched. (And django2.2
 generated query is performant, while django 3.2 query is awfull).
 I've managed to use your advice and modify the query to force the subquery
 to use IN but that leads to synthax that is more complex, while the django
 2.2 synthax is a lot simpler: ~Q(translation=None)

 The only synthax that manages to replicate the query correctly (and
 performance wise) seems to be
 {{{qs.exclude(id__in=Blog.objects.filter(translation=None,
 id=OuterRef("id")))}}}}

 While not terrible I do think something has been broken in django since
 the ~Q(translation=None) seems a lot cleaner and more ORMish.

 Thanks,
 Lorand.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018862eff40b-49dabfd3-5832-432a-a027-42fa088f1f7d-000000%40eu-central-1.amazonses.com.

Reply via email to