#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Lorand Varga):
Simon, I want to say that I really appreciate your support and patience on
this. If I've been too annoying in my comments, it was not on purpose -
I'm actually really curios if there is an underlying issue or not with
django (since there are a lot of interactions going on). I'm not insisting
for any revert unless needed and your arguments were solid and I'm very
grateful for your support.
Getting back to the needy greedy details, turning off the workers does not
improve anything.
The NOT(Exists):
{{{
SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
Nested Loop Anti Join (cost=1.13..132780.57 rows=180586 width=1985)
(actual time=211403.273..3034890.176 rows=4 loops=1)
Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
test_app_blog.id))
Rows Removed by Join Filter: 16853800739
-> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual
time=1.080..1356.688 rows=179902 loops=1)
Filter: (is_published AND (NOT api_has_translation))
Rows Removed by Filter: 27456
-> Materialize (cost=0.71..22504.89 rows=1 width=4) (actual
time=0.002..9.083 rows=93684 loops=179902)
-> Merge Left Join (cost=0.71..22504.88 rows=1 width=4) (actual
time=0.016..140.731 rows=194102 loops=1)
Merge Cond: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 59794
-> Index Only Scan using test_app_blog_pkey on
test_app_blog u0 (cost=0.42..15170.42 rows=207267 width=4) (actual
time=0.007..67.559 rows=207358 loops=1)
Heap Fetches: 33518
-> Index Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..6077.43 rows=59809 width=8) (actual
time=0.005..30.515 rows=59794 loops=1)
Planning Time: 0.936 ms
Execution Time: 3034891.393 ms
(16 rows)
}}}
The Exists does work:
{{{
SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_translation" U1
WHERE
U1."blog_id" = "test_app_blog"."id"
LIMIT
1
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
Sort (cost=21037.38..21066.08 rows=11479 width=22) (actual
time=63.818..63.820 rows=4 loops=1)
Sort Key: test_app_blog.updated_at DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1501.37..20263.31 rows=11479 width=22) (actual
time=17.974..63.805 rows=4 loops=1)
-> HashAggregate (cost=1500.95..1632.70 rows=13175 width=4)
(actual time=17.018..20.483 rows=13256 loops=1)
Group Key: u1.blog_id
Batches: 1 Memory Usage: 1425kB
-> Index Only Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..1351.42 rows=59809 width=4) (actual
time=0.007..6.721 rows=59794 loops=1)
Heap Fetches: 735
-> Index Scan using test_app_blog_pkey on test_app_blog
(cost=0.42..1.47 rows=1 width=22) (actual time=0.003..0.003 rows=0
loops=13256)
Index Cond: (id = u1.blog_id)
Filter: (is_published AND (NOT api_has_translation))
Rows Removed by Filter: 1
Planning Time: 0.703 ms
Execution Time: 63.872 ms
}}}
Also played with various values for work_mem and haven't seen any change
in the postgres planning.
I want to also mention that David is on to something.
{{{
SET enable_material='off';
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC;
Sort (cost=143578.39..144029.85 rows=180585 width=22) (actual
time=691.697..691.802 rows=4 loops=1)
Sort Key: test_app_blog.updated_at DESC
Sort Method: quicksort Memory: 25kB
-> Hash Anti Join (cost=21645.24..125693.23 rows=180585 width=22)
(actual time=306.930..691.769 rows=4 loops=1)
Hash Cond: (test_app_blog.id = u0.id)
-> Index Scan using test_app_blog_is_published_4b47c652_uniq on
test_app_blog (cost=0.42..101768.51 rows=180586 width=22) (actual
time=0.684..304.445 rows=179902 loops=1)
Index Cond: (is_published = true)
Filter: (NOT api_has_translation)
Rows Removed by Filter: 12043
-> Hash (cost=21644.81..21644.81 rows=1 width=4) (actual
time=283.265..283.369 rows=194102 loops=1)
Buckets: 262144 (originally 1024) Batches: 2 (originally
1) Memory Usage: 6145kB
-> Gather Merge (cost=1000.73..21644.81 rows=1 width=4)
(actual time=6.321..245.263 rows=194102 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merge Left Join (cost=0.71..20644.67 rows=1
width=4) (actual time=0.067..105.796 rows=64701 loops=3)
Merge Cond: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 19931
-> Parallel Index Only Scan using
test_app_blog_pkey on test_app_blog u0 (cost=0.42..13961.37 rows=86361
width=4) (actual time=0.030..40.520 rows=69119 loops=3)
Heap Fetches: 33632
-> Index Scan using
test_app_translation_51c6d5db on test_app_translation u1
(cost=0.29..6077.43 rows=59809 width=8) (actual time=0.032..42.470
rows=59627 loops=3)
Planning Time: 2.751 ms
Execution Time: 692.507 ms
}}}
Not sure yet why my postgres seems restricted (I remember it has "enough"
RAM) but will get back with a comment once I find out more.
--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:9>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070188685361c3-48679f05-a66a-4f8a-b99a-1b380e11d929-000000%40eu-central-1.amazonses.com.