#36157: Unusual behaviour when pre-fetching with only applied on the related 
fields
-------------------------------+-----------------------------------------
     Reporter:  Tim McCurrach  |                     Type:  Bug
       Status:  new            |                Component:  Uncategorized
      Version:  5.1            |                 Severity:  Normal
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+-----------------------------------------
 When prefetching related models. If you apply `.only()` to the related
 queryset, django performs additional lookups for related IDs that have
 been left out of the `only`.

 It is probably easiest to explain the issue with an example. Suppose you
 have these models:

 {{{
 class Blog(models.Model):
     name = models.CharField(max_length=100)

 class Post(models.Model):
     name = models.CharField(max_length=100)
     blog = models.ForeignKey(Blog, on_delete=models.CASCADE,
 related_name="posts")
     ...lots of other big fields
 }}}
 And you create a few items for each:
 {{{
 blog = Blog.objects.create(name="Django Tricks")
 blog2 = Blog.objects.create(name="React Tricks")
 Post.objects.create(name="prefetching", blog=blog)
 Post.objects.create(name="models", blog=blog)
 Post.objects.create(name="templates", blog=blog)
 Post.objects.create(name="hooks", blog=blog2)
 Post.objects.create(name="components", blog=blog2)
 }}}
 If I wish to pre-fetch the posts for some blogs, but only want the names
 of each post, rather than the content of each post I can do the following:
 {{{
 Blog.objects.prefetch_related(Prefetch("posts",
 queryset=Post.objects.only("name")))
 }}}
 I would expect this to result in just 2 database queries. One to fetch the
 data for the `Blog` instances, and another to fetch the the data for the
 related `Post`s. Instead, there is an n+1 issue where there are 5 extra
 follow up requests for each of the related `Post` instances. This is the
 SQL that is generated:
 {{{
 SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
 SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE
 "app_post"."blog_id" IN (1, 2)
 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
 "app_post"."id" = 1 LIMIT 21
 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
 "app_post"."id" = 2 LIMIT 21
 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
 "app_post"."id" = 3 LIMIT 21
 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
 "app_post"."id" = 4 LIMIT 21
 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE
 "app_post"."id" = 5 LIMIT 21
 }}}
 I can understand it might be a good idea to have the related-id's for the
 blog on hand should you need them later. But I also think, that by using
 `.only()` you are explicitly telling django - I don't need these. This is
 a real problem for larger data-sets, where you end up with thousands of
 extra round-trips to the database.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36157>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070194b4264431-aaa5ed5b-aca1-4385-bdc0-a609a9a13b38-000000%40eu-central-1.amazonses.com.

Reply via email to