#33973: Performance regression when moving from 3.1 to 3.2
-------------------------------------+-------------------------------------
     Reporter:  Marc Parizeau        |                    Owner:  nobody
         Type:  Uncategorized        |                   Status:  closed
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:  needsinfo
     Keywords:  performance          |             Triage Stage:
  regression                         |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

 Bonjour Marc,

 There are some sources listed on the
 [https://github.com/django/django/pull/13606 PR]  that introduced these
 changes that documents the performance benefits of using `EXISTS` over
 `IN` and the case where `IN` will be faster which I believe is what you're
 experiencing here.

 Given there are performance benefits in most cases and `IN` is only faster
 [https://www.percona.com/blog/2020/04/16/sql-optimizations-in-postgresql-
 in-vs-exists-vs-any-all-vs-join/ when the subquery returns a small number
 of rows] as a hashmap can be created I believe we should keep the change
 around.

 The good news is that you can restore the previous behaviour by being more
 explicit in your usage of `exclude` by using an explicit `IN` lookup
 instead.

 {{{#!python
 user_thread_entries = ThreadEntry.objects.filter(user=user,
 thread__isnull=False).values("thread")
 Content.objects.annotate(
     unread=Exists(
         Thread.objects.filter(
             content=OuterRef('pk'),
         ).exclude(pk__in=user_thread_entries)
     )
 ).filter(
     unread=True,
 )
 }}}

 Keep in mind though that, as pointed out in the article, this query will
 only be faster for users with a relatively small number of thread entries
 as Postgres will turn the set of rows into a hashmap which is means that
 this query will become more expensive as the number of threads user
 creates increase.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33973#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018300ca64f2-0224d45b-3d9a-49bd-826c-d4ed86803f1a-000000%40eu-central-1.amazonses.com.

Reply via email to