#36157: Unusual behaviour when pre-fetching with only applied on the related
fields
-------------------------------+--------------------------------------
Reporter: Tim McCurrach | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by Tim McCurrach:
Old description:
> When prefetching related models. If you apply `.only()` to the related
> queryset, django performs additional lookups for related IDs that have
> been left out of the `only`.
>
> It is probably easiest to explain the issue with an example. Suppose you
> have these models:
>
> {{{
> class Blog(models.Model):
> name = models.CharField(max_length=100)
>
> class Post(models.Model):
> name = models.CharField(max_length=100)
> blog = models.ForeignKey(Blog, on_delete=models.CASCADE,
> related_name="posts")
> ...lots of other big fields
> }}}
> And you create a few items for each:
> {{{
> blog = Blog.objects.create(name="Django Tricks")
> blog2 = Blog.objects.create(name="React Tricks")
> Post.objects.create(name="prefetching", blog=blog)
> Post.objects.create(name="models", blog=blog)
> Post.objects.create(name="templates", blog=blog)
> Post.objects.create(name="hooks", blog=blog2)
> Post.objects.create(name="components", blog=blog2)
> }}}
> If I wish to pre-fetch the posts for some blogs, but only want the names
> of each post, rather than the content of each post I can do the
> following:
> {{{
> Blog.objects.prefetch_related(Prefetch("posts",
> queryset=Post.objects.only("name")))
> }}}
> I would expect this to result in just 2 database queries. One to fetch
> the data for the `Blog` instances, and another to fetch the the data for
> the related `Post`s. Instead, there is an n+1 issue where there are 5
> extra follow up requests for each of the related `Post` instances. This
> is the SQL that is generated:
> {{{
> SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
> SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE
> "app_post"."blog_id" IN (1, 2)
> SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
> "app_post"."id" = 1 LIMIT 21
> SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
> "app_post"."id" = 2 LIMIT 21
> SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
> "app_post"."id" = 3 LIMIT 21
> SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
> "app_post"."id" = 4 LIMIT 21
> SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
> "app_post"."id" = 5 LIMIT 21
> }}}
> I can understand it might be a good idea to have the related-id's for the
> blog on hand should you need them later. But I also think, that by using
> `.only()` you are explicitly telling django - I don't need these. This is
> a real problem for larger data-sets, where you end up with thousands of
> extra round-trips to the database.
New description:
When prefetching related models. If you apply `.only()` to the related
queryset, django performs additional lookups for related IDs that have
been left out of the `only`.
It is probably easiest to explain the issue with an example.
=== Example Situation ===
Suppose you have these models:
{{{
class Blog(models.Model):
name = models.CharField(max_length=100)
class Post(models.Model):
name = models.CharField(max_length=100)
blog = models.ForeignKey(Blog, on_delete=models.CASCADE,
related_name="posts")
...lots of other big fields
}}}
And you create a few items for each:
{{{
blog = Blog.objects.create(name="Django Tricks")
blog2 = Blog.objects.create(name="React Tricks")
Post.objects.create(name="prefetching", blog=blog)
Post.objects.create(name="models", blog=blog)
Post.objects.create(name="templates", blog=blog)
Post.objects.create(name="hooks", blog=blog2)
Post.objects.create(name="components", blog=blog2)
}}}
If I wish to pre-fetch the posts for some blogs, but only want the names
of each post, rather than the content of each post I can do the following:
{{{
Blog.objects.prefetch_related(Prefetch("posts",
queryset=Post.objects.only("name")))
}}}
I would expect this to result in just 2 database queries. One to fetch the
data for the `Blog` instances, and another to fetch the the data for the
related `Post`s. Instead, there is an n+1 issue where there are 5 extra
follow up requests for each of the related `Post` instances. This is the
SQL that is generated:
{{{
SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE
"app_post"."blog_id" IN (1, 2)
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 1 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 2 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 3 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 4 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
"app_post"."id" = 5 LIMIT 21
}}}
I can understand it might be a good idea to have the related-id's for the
blog on hand should you need them later. But I also think, that by using
`.only()` you are explicitly telling django - I don't need these. This is
a real problem for larger data-sets, where you end up with thousands of
extra round-trips to the database.
=== Context ===
This is an issue that came up in the wild. I'm using a third-party
optimiser that improves the performance of graphQL queries by decorating
querysets with `only()`, `select_related()` etc. It correctly identifies
that I am only using certain fields and applies `only()` to them, knowing
I will never need to access certain related fields. This results in django
producing many additional hits to the database.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/36157#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070194b42cb700-ec48ce38-65fe-4648-90d5-03ba9dd506f9-000000%40eu-central-1.amazonses.com.