#25464: Allow skipping IN clause on prefetch queries
-------------------------------------+-------------------------------------
Reporter: ecederstrand | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by shaib):
* cc: shaib (added)
Old description:
> When using prefetch_related() on a large queryset, the prefetch query SQL
> can be inefficient. Consider this:
> {{{
> Category.objects.filter(type=5).prefetch_related('items')
> }}}
> If 100.000 categories have type=5, then an IN clause with 100.000
> Category IDs is generated to get the Item objects. Even with a custom
> queryset using a Prefetch() object, the IN clause is generated, even
> though it is A) redundant, B) sends a potentially multi-megabyte SQL
> statement over the wire for the database to process, C) may confuse the
> query planner to generate an inefficient execution plan, and D) doesn't
> scale:
> {{{
> Category.objects.filter(type=5).prefetch_related(Prefetch('items',
> queryset=Item.objects.filter(category__item=5)))
> }}}
> Pull request https://github.com/django/django/pull/5356 adds the
> possibility to skip the IN clause in cases where we are sure that a
> better queryset will get (at least) the same items as the IN clause
> would:
> {{{
> Category.objects.filter(type=5).prefetch_related(Prefetch('items',
> queryset=Item.objects.filter(category__item=5),
> filter_on_instances=False))
> }}}
> In my tests, this speeds up prefetch_related() by 20x-50x on large
> querysets.
New description:
When using prefetch_related() on a large queryset, the prefetch query SQL
can be inefficient. Consider this:
{{{
Category.objects.filter(type=5).prefetch_related('items')
}}}
If 100.000 categories have type=5, then an IN clause with 100.000 Category
IDs is generated to get the Item objects. Even with a custom queryset
using a Prefetch() object, the IN clause is generated, even though it is
A) redundant, B) sends a potentially multi-megabyte SQL statement over the
wire for the database to process, C) may confuse the query planner to
generate an inefficient execution plan, and D) doesn't scale:
{{{
Category.objects.filter(type=5).prefetch_related(Prefetch('items',
queryset=Item.objects.filter(category__type=5)))
}}}
Pull request https://github.com/django/django/pull/5356 adds the
possibility to skip the IN clause in cases where we are sure that a better
queryset will get (at least) the same items as the IN clause would:
{{{
Category.objects.filter(type=5).prefetch_related(Prefetch('items',
queryset=Item.objects.filter(category__type=5),
filter_on_instances=False))
}}}
In my tests, this speeds up prefetch_related() by 20x-50x on large
querysets.
--
Comment:
Two notes come to mind:
1) While this may be less "natural" to think of, it seems the query would
be more natural and efficient as
{{{
Item.objects.filter(category__type=5).select_related('category')
}}}
Of course, this would require restructuring the code that handles the
items and categories.
2) Not sure if this is as easy, but I think a better and more general
alternative would be to expose the "joining-in-python" mechanism for
general use. I'm thinking along the lines of
{{{
cats = Category.objects.filter(type=5)
items = Item.Item.objects.filter(category__type=5)
cats.use_prefetched('items', items)
}}}
where `'items'` is the name of the reverse relation, of course, and
`items` could be replaced with any iterable returning `Item` instances.
--
Ticket URL: <https://code.djangoproject.com/ticket/25464#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/070.3608c95d5823a87bbdffcc31819269e8%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.