#25464: Allow skipping IN clause on prefetch queries
-------------------------------------+-------------------------------------
     Reporter:  ecederstrand         |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  1
  Needs tests:  1                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Description changed by timgraham:

Old description:

> When using prefetch_related() on a large queryset, the prefetch query SQL
> can be inefficient. Consider this:
>
>     Category.objects.filter(type=5).prefetch_related('items')
>
> If 100.000 categories have type=5, then an IN clause with 100.000
> Category IDs is generated to get the Item objects. Even with a custom
> queryset using a Prefetch() object, the IN clause is generated, even
> though it is A) redundant, B) sends a potentially multi-megabyte SQL
> statement over the wire for the database to process, C) may confuse the
> query planner to generate an inefficient execution plan, and D) doesn't
> scale:
>
>     Category.objects.filter(type=5).prefetch_related(Prefetch('items',
> queryset=Item.objects.filter(category__item=5)))
>
> Pull request https://github.com/django/django/pull/5356 adds the
> possibility to skip the IN clause in cases where we are sure that a
> better queryset will get (at least) the same items as the IN clause
> would:
>
>     Category.objects.filter(type=5).prefetch_related(Prefetch('items',
> queryset=Item.objects.filter(category__item=5),
> filter_on_instances=False))
>
> In my tests, this speeds up prefetch_related() by 20x-50x on large
> querysets.

New description:

 When using prefetch_related() on a large queryset, the prefetch query SQL
 can be inefficient. Consider this:
 {{{
     Category.objects.filter(type=5).prefetch_related('items')
 }}}
 If 100.000 categories have type=5, then an IN clause with 100.000 Category
 IDs is generated to get the Item objects. Even with a custom queryset
 using a Prefetch() object, the IN clause is generated, even though it is
 A) redundant, B) sends a potentially multi-megabyte SQL statement over the
 wire for the database to process, C) may confuse the query planner to
 generate an inefficient execution plan, and D) doesn't scale:
 {{{
     Category.objects.filter(type=5).prefetch_related(Prefetch('items',
 queryset=Item.objects.filter(category__item=5)))
 }}}
 Pull request https://github.com/django/django/pull/5356 adds the
 possibility to skip the IN clause in cases where we are sure that a better
 queryset will get (at least) the same items as the IN clause would:
 {{{
     Category.objects.filter(type=5).prefetch_related(Prefetch('items',
 queryset=Item.objects.filter(category__item=5),
 filter_on_instances=False))
 }}}
 In my tests, this speeds up prefetch_related() by 20x-50x on large
 querysets.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/25464#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/070.a37ae1970af1b571e03a554c12d7772b%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to