#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
     Reporter:  Márton Salomváry     |                    Owner:  (none)
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  5.0
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:  invalid
     Keywords:  chained-filters      |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

 * cc: Simon Charette (added)
 * keywords:  queryset filter regression => chained-filters
 * resolution:   => invalid
 * status:  new => closed
 * type:  Bug => Cleanup/optimization

Comment:

 Hello Márton Salomváry, thank you for your report, as well as for the test
 project and bisected revision number.

 In general, Django does not make guarantees about the exact SQL that is
 generated. This means SQL can change from version to version, and we would
 not consider that a bug as long as the semantics of the query remain the
 same (which, based on my testing, is the case here). The resulting
 queryset still returns the correct values.

 That said, we do care about potential performance impacts and/or
 regressions. When triaging, I focused on how this change affects query
 planning, and I can confirm that the query plan is different/worse (see
 EXPLAIN ANALYZE results for Postgres comparing `4.2.x` vs `main`). I do
 wonder, however, whether the way the queries are currently written might
 be more complex than necessary, which could contribute to these
 differences in execution plans. For example, this alternative queryset is,
 IMHO, equivalent and much more straightfoward:

 {{{#!python
 Alpha.objects.filter(
     Q(bravos__charlie_bravos__charlie=9999)
     & Exists(
         Delta.objects.filter(
             bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
         )
     )
 )
 }}}

 Because of the above, I'll close the ticket as `invalid`. If you encounter
 a case where a query written in a way that aligns with Django's ORM
 patterns (as shown above) returns incorrect results or performs poorly,
 please feel free to reopen it with details. Thanks again!

 === Appendix ===
 `EXPLAIN ANALYZE` for `stable/4.2.x`:
 {{{
 Nested Loop  (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.013
 rows=1 loops=1)
   Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
   Inner Unique: true
   ->  Nested Loop  (cost=7.64..41.29 rows=6 width=4) (actual
 time=0.009..0.011 rows=1 loops=1)
         Output: test_filter_regression_bravo_alphas.alpha_id
         ->  Nested Loop  (cost=7.49..40.26 rows=2 width=12) (actual
 time=0.008..0.010 rows=1 loops=1)
               Output: test_filter_regression_bravo.id, u1.bravo_id,
 test_filter_regression_charliebravo.bravo_id
               Inner Unique: true
               Join Filter: (test_filter_regression_bravo.id =
 test_filter_regression_charliebravo.bravo_id)
               ->  Nested Loop Semi Join  (cost=7.22..39.05 rows=2 width=8)
 (actual time=0.007..0.008 rows=1 loops=1)
                     Output: test_filter_regression_charliebravo.bravo_id,
 u1.bravo_id
                     ->  Bitmap Heap Scan on
 public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3
 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                           Output: test_filter_regression_charliebravo.id,
 test_filter_regression_charliebravo.name,
 test_filter_regression_charliebravo.charlie_id,
 test_filter_regression_charliebravo.bravo_id
                           Recheck Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                           Heap Blocks: exact=1
                           ->  Bitmap Index Scan on
 test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17
 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                 Index Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                     ->  Nested Loop  (cost=3.05..13.34 rows=540 width=4)
 (actual time=0.002..0.002 rows=0 loops=2)
                           Output: u1.bravo_id
                           Inner Unique: true
                           ->  Bitmap Heap Scan on
 public.test_filter_regression_delta_bravos u1  (cost=2.90..11.43 rows=10
 width=8) (actual time=0.001..0.001 rows=0 loops=2)
                                 Output: u1.id, u1.delta_id, u1.bravo_id
                                 Recheck Cond:
 (test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
                                 Heap Blocks: exact=1
                                 ->  Bitmap Index Scan on
 test_filter_regression_delta_bravos_bravo_id_5b518e55  (cost=0.00..2.89
 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
                                       Index Cond: (u1.bravo_id =
 test_filter_regression_charliebravo.bravo_id)
                           ->  Index Only Scan using
 test_filter_regression_delta_pkey on public.test_filter_regression_delta
 u0  (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
 loops=1)
                                 Output: u0.id
                                 Index Cond: (u0.id = u1.delta_id)
                                 Heap Fetches: 1
               ->  Index Only Scan using test_filter_regression_bravo_pkey
 on public.test_filter_regression_bravo  (cost=0.28..0.59 rows=1 width=4)
 (actual time=0.001..0.001 rows=1 loops=1)
                     Output: test_filter_regression_bravo.id
                     Index Cond: (test_filter_regression_bravo.id =
 u1.bravo_id)
                     Heap Fetches: 1
         ->  Index Scan using
 test_filter_regression_bravo_alphas_bravo_id_042061e7 on
 public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10
 width=8) (actual time=0.001..0.001 rows=1 loops=1)
               Output: test_filter_regression_bravo_alphas.id,
 test_filter_regression_bravo_alphas.bravo_id,
 test_filter_regression_bravo_alphas.alpha_id
               Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
 test_filter_regression_bravo.id)
   ->  Index Scan using test_filter_regression_alpha_pkey on
 public.test_filter_regression_alpha  (cost=0.28..0.32 rows=1 width=122)
 (actual time=0.001..0.001 rows=1 loops=1)
         Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
         Index Cond: (test_filter_regression_alpha.id =
 test_filter_regression_bravo_alphas.alpha_id)
 Planning Time: 0.382 ms
 Execution Time: 0.027 ms
 }}}

 For `main`:
 {{{
 Hash Join  (cost=79.87..131.14 rows=20 width=122) (actual
 time=0.074..0.171 rows=1 loops=1)
   Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
   Inner Unique: true
   Hash Cond: (t6.bravo_id = u1.bravo_id)
   ->  Nested Loop  (cost=12.32..63.27 rows=40 width=134) (actual
 time=0.046..0.143 rows=1 loops=1)
         Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name, t6.bravo_id, t7.id, t8.bravo_id
         ->  Nested Loop  (cost=12.17..52.62 rows=42 width=130) (actual
 time=0.041..0.137 rows=1 loops=1)
               Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name, t6.bravo_id, t7.id
               Inner Unique: true
               ->  Nested Loop  (cost=11.90..39.10 rows=42 width=126)
 (actual time=0.034..0.130 rows=1 loops=1)
                     Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name, t6.bravo_id
                     Join Filter: (test_filter_regression_alpha.id =
 t6.alpha_id)
                     ->  Nested Loop  (cost=11.74..33.86 rows=11 width=126)
 (actual time=0.030..0.126 rows=1 loops=1)
                           Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name,
 test_filter_regression_bravo_alphas.alpha_id
                           Inner Unique: true
                           ->  Nested Loop  (cost=11.47..30.32 rows=11
 width=4) (actual time=0.026..0.121 rows=1 loops=1)
                                 Output:
 test_filter_regression_bravo_alphas.alpha_id
                                 ->  Hash Join  (cost=11.32..28.77 rows=3
 width=8) (actual time=0.018..0.112 rows=2 loops=1)
                                       Output:
 test_filter_regression_bravo.id,
 test_filter_regression_charliebravo.bravo_id
                                       Hash Cond:
 (test_filter_regression_bravo.id =
 test_filter_regression_charliebravo.bravo_id)
                                       ->  Seq Scan on
 public.test_filter_regression_bravo  (cost=0.00..15.40 rows=540 width=4)
 (actual time=0.003..0.051 rows=900 loops=1)
                                             Output:
 test_filter_regression_bravo.id, test_filter_regression_bravo.name
                                       ->  Hash  (cost=11.28..11.28 rows=3
 width=4) (actual time=0.006..0.006 rows=2 loops=1)
                                             Output:
 test_filter_regression_charliebravo.bravo_id
                                             Buckets: 1024  Batches: 1
 Memory Usage: 9kB
                                             ->  Bitmap Heap Scan on
 public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3
 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                                                   Output:
 test_filter_regression_charliebravo.bravo_id
                                                   Recheck Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                                                   Heap Blocks: exact=1
                                                   ->  Bitmap Index Scan on
 test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17
 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                                         Index Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                                 ->  Index Scan using
 test_filter_regression_bravo_alphas_bravo_id_042061e7 on
 public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10
 width=8) (actual time=0.004..0.004 rows=0 loops=2)
                                       Output:
 test_filter_regression_bravo_alphas.id,
 test_filter_regression_bravo_alphas.bravo_id,
 test_filter_regression_bravo_alphas.alpha_id
                                       Index Cond:
 (test_filter_regression_bravo_alphas.bravo_id =
 test_filter_regression_bravo.id)
                           ->  Index Scan using
 test_filter_regression_alpha_pkey on public.test_filter_regression_alpha
 (cost=0.28..0.32 rows=1 width=122) (actual time=0.004..0.004 rows=1
 loops=1)
                                 Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
                                 Index Cond:
 (test_filter_regression_alpha.id =
 test_filter_regression_bravo_alphas.alpha_id)
                     ->  Index Scan using
 test_filter_regression_bravo_alphas_alpha_id_b2c31687 on
 public.test_filter_regression_bravo_alphas t6  (cost=0.15..0.35 rows=10
 width=8) (actual time=0.004..0.004 rows=1 loops=1)
                           Output: t6.id, t6.bravo_id, t6.alpha_id
                           Index Cond: (t6.alpha_id =
 test_filter_regression_bravo_alphas.alpha_id)
               ->  Index Only Scan using test_filter_regression_bravo_pkey
 on public.test_filter_regression_bravo t7  (cost=0.28..0.32 rows=1
 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                     Output: t7.id
                     Index Cond: (t7.id = t6.bravo_id)
                     Heap Fetches: 1
         ->  Index Only Scan using
 test_filter_regression_charliebravo_bravo_id_1b159238 on
 public.test_filter_regression_charliebravo t8  (cost=0.15..0.22 rows=3
 width=4) (actual time=0.005..0.005 rows=1 loops=1)
               Output: t8.bravo_id
               Index Cond: (t8.bravo_id = t6.bravo_id)
               Heap Fetches: 1
   ->  Hash  (cost=65.05..65.05 rows=200 width=4) (actual time=0.020..0.021
 rows=2 loops=1)
         Output: u1.bravo_id
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=63.05..65.05 rows=200 width=4) (actual
 time=0.017..0.017 rows=2 loops=1)
               Output: u1.bravo_id
               Group Key: u1.bravo_id
               Batches: 1  Memory Usage: 40kB
               ->  Hash Join  (cost=22.15..57.95 rows=2040 width=4) (actual
 time=0.015..0.016 rows=2 loops=1)
                     Output: u1.bravo_id
                     Inner Unique: true
                     Hash Cond: (u1.delta_id = u0.id)
                     ->  Seq Scan on
 public.test_filter_regression_delta_bravos u1  (cost=0.00..30.40 rows=2040
 width=8) (actual time=0.001..0.001 rows=2 loops=1)
                           Output: u1.id, u1.delta_id, u1.bravo_id
                     ->  Hash  (cost=15.40..15.40 rows=540 width=4) (actual
 time=0.004..0.004 rows=2 loops=1)
                           Output: u0.id
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on
 public.test_filter_regression_delta u0  (cost=0.00..15.40 rows=540
 width=4) (actual time=0.001..0.001 rows=2 loops=1)
                                 Output: u0.id
 Planning Time: 2.462 ms
 Execution Time: 0.238 ms
 }}}

 For `main` with optimized queryset:
 {{{
 Nested Loop  (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.012
 rows=1 loops=1)
   Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
   Inner Unique: true
   ->  Nested Loop  (cost=7.64..41.29 rows=6 width=4) (actual
 time=0.009..0.011 rows=1 loops=1)
         Output: test_filter_regression_bravo_alphas.alpha_id
         ->  Nested Loop  (cost=7.49..40.26 rows=2 width=12) (actual
 time=0.008..0.010 rows=1 loops=1)
               Output: test_filter_regression_bravo.id, u1.bravo_id,
 test_filter_regression_charliebravo.bravo_id
               Inner Unique: true
               Join Filter: (test_filter_regression_bravo.id =
 test_filter_regression_charliebravo.bravo_id)
               ->  Nested Loop Semi Join  (cost=7.22..39.05 rows=2 width=8)
 (actual time=0.006..0.008 rows=1 loops=1)
                     Output: test_filter_regression_charliebravo.bravo_id,
 u1.bravo_id
                     ->  Bitmap Heap Scan on
 public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3
 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                           Output: test_filter_regression_charliebravo.id,
 test_filter_regression_charliebravo.name,
 test_filter_regression_charliebravo.charlie_id,
 test_filter_regression_charliebravo.bravo_id
                           Recheck Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                           Heap Blocks: exact=1
                           ->  Bitmap Index Scan on
 test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17
 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                 Index Cond:
 (test_filter_regression_charliebravo.charlie_id = 9999)
                     ->  Nested Loop  (cost=3.05..13.34 rows=540 width=4)
 (actual time=0.002..0.002 rows=0 loops=2)
                           Output: u1.bravo_id
                           Inner Unique: true
                           ->  Bitmap Heap Scan on
 public.test_filter_regression_delta_bravos u1  (cost=2.90..11.43 rows=10
 width=8) (actual time=0.001..0.001 rows=0 loops=2)
                                 Output: u1.id, u1.delta_id, u1.bravo_id
                                 Recheck Cond:
 (test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
                                 Heap Blocks: exact=1
                                 ->  Bitmap Index Scan on
 test_filter_regression_delta_bravos_bravo_id_5b518e55  (cost=0.00..2.89
 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
                                       Index Cond: (u1.bravo_id =
 test_filter_regression_charliebravo.bravo_id)
                           ->  Index Only Scan using
 test_filter_regression_delta_pkey on public.test_filter_regression_delta
 u0  (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1
 loops=1)
                                 Output: u0.id
                                 Index Cond: (u0.id = u1.delta_id)
                                 Heap Fetches: 1
               ->  Index Only Scan using test_filter_regression_bravo_pkey
 on public.test_filter_regression_bravo  (cost=0.28..0.59 rows=1 width=4)
 (actual time=0.001..0.001 rows=1 loops=1)
                     Output: test_filter_regression_bravo.id
                     Index Cond: (test_filter_regression_bravo.id =
 u1.bravo_id)
                     Heap Fetches: 1
         ->  Index Scan using
 test_filter_regression_bravo_alphas_bravo_id_042061e7 on
 public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10
 width=8) (actual time=0.001..0.001 rows=1 loops=1)
               Output: test_filter_regression_bravo_alphas.id,
 test_filter_regression_bravo_alphas.bravo_id,
 test_filter_regression_bravo_alphas.alpha_id
               Index Cond: (test_filter_regression_bravo_alphas.bravo_id =
 test_filter_regression_bravo.id)
   ->  Index Scan using test_filter_regression_alpha_pkey on
 public.test_filter_regression_alpha  (cost=0.28..0.32 rows=1 width=122)
 (actual time=0.001..0.001 rows=1 loops=1)
         Output: test_filter_regression_alpha.id,
 test_filter_regression_alpha.name
         Index Cond: (test_filter_regression_alpha.id =
 test_filter_regression_bravo_alphas.alpha_id)
 Planning Time: 0.381 ms
 Execution Time: 0.027 ms
 }}}
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070198bddf0a0f-eb4187b0-0010-4d22-9125-a5c8a4c27e29-000000%40eu-central-1.amazonses.com.

Reply via email to