Re: [Django] #16937: Prefetch related objects

Django Sat, 01 Oct 2011 10:30:06 -0700

#16937: Prefetch related objects
-------------------------------------+-------------------------------------
     Reporter:  lukeplant            |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  1.3
  (models, ORM)                      |               Resolution:
     Severity:  Normal               |             Triage Stage:  Accepted
     Keywords:                       |      Needs documentation:  0
    Has patch:  0                    |  Patch needs improvement:  0
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------


Comment (by akaariai):

 So the API might look something like this in the future:
 {{{
 Books.objects.prefetch_related(
     Opts('young_readers', path='books__read_by', qs=filtering_qs,
 collapse=True),
     'another_related'
 )
 # or with tuple / namedtuple instead of Opts. The kwarg names are just
 placeholders.
 }}}

 I can live with that :)

 Here is the review, there is a lot of stuff here, but most (all?) of these
 are issues that do not need to be dealt with at the moment. They can be
 resolved in later tickets. I will mark "patch needs improvement" but feel
 free to just unset it if you feel that these issues are better dealt in
 later tickets.

  - About db_for_read usage: Is this usage correct?
 {{{
 [contrib/contentypes/generic.py:278 and db/models/fields/related.py:446]
 db = self._db or router.db_for_read(self.model, instance=instances[0])
 }}}
 Two possible problems: 1. the model and instance are of different classes,
 I don't know, maybe
 this is OK? 2. Only the first instance of the list is given as a hint, is
 this OK? I don't know multidb too well, so it might be this is OK.
  - Related to above: no multidb tests.

  - Attached is a failing tests - it is related to having more than 1000
 parameters in single SQL query. I think this should be fixed, but this can
 maybe be done in a later commit.

  - The patch assumes relation to the PK field of the model in m2m cases -
 this is not necessary the case. Attached is a failing test related to this
 and a successful test in foreign key case.

  - I don't understand what the following comment means, and there are some
 typos in it:
 {{{
 [modeltests/prefetch_related:141-146]
 The second prefetch_related lookup is a reverse foreign key. This means
 the query for it can
 only be something like "first_time_authors__pk__in = [...]" and cannot
 return more rows then
 the number of Author objects in the database.  This means that these
 objects will be reused
 (since in our data we've arranged for there len(authors1) >
 Author.objects.count())
 }}}

  - I don't like the usage of queryset len() to force the evaluation of the
 queryset. Better to have explicit method (.evaluate()) or something) and
 call it than rely on the implicit evaluation when len() is called. Maybe
 another ticket for this?

  - The current method of select_matching_instances can be slow. The method
 used here is effectively a nested loop, while using a dict would be
 considerably faster in many cases. The current method is O(nm) while dict
 would give O(n + m) but with higer memory cost. Currently fetching 1000
 objects and prefetching 5 related objects for each takes 3 seconds on my
 laptop. Just fetching the objects in two different querysets takes 0.2
 seconds. So the joining takes almost 2.8 seconds. In the end of this post
 is a profile of this run.

  - In models/query.py prefetch_related_objects there is this loop:
 {{{
       i = 0
       while i < len(fields):
           field = fields[i]
 }}}
    Am I missing something, or wouldn't "for field in fields" work here?

  - Same method as above: Is it good enough just to break here, or should
 the code raise an exception
    (not the AttributeError but something more helpful):
    {{{
         good_objects = True
         for obj in obj_list:
             if not hasattr(obj, '_prefetched_objects_cache'):
                 try:
                     obj._prefetched_objects_cache = {}
                 except AttributeError:
                     # Must be in a QuerySet subclass that is not returning
                     # Model instances, either in Django or 3rd
                     # party. prefetch_related() doesn't make sense, so
 quit
                     # now.
                     good_objects = False
                     break
         if not good_objects:
             break
 }}}
 Is it really necessary to loop through the whole obj_list? One would
 assume that they are homogenous, so checking just obj_list[0] would be
 enough?

  - In models/query.py around line 1646 there is this snippet:
    {{{
       for obj in instances:
           qs = getattr(obj, attname).all()
    }}}
    I am afraid this is somewhat expensive, as this will result in the
 following code executed for each
    obj:
    {{{
        return super(RelatedManager,
 self).get_query_set().using(db).filter(**(self.core_filters))
    }}}
    That is one QuerySet construction and two clones for each object. This
 is one good reason not to use
    relatedmanager.all() for prefetch cache... :)

  - Some tests for multi-table inheritance added in the attachment.

  - I have only skimmed the domcumentation.

 As noted in the beginning most of these issues are not blockers but things
 that might need work later. IMHO the code is high-quality, there are good
 tests and the documentation looks sane. Even if there are a lot of
 nitpicks above I think this is very close to be ready for commit.

 The promised profile:
 {{{
         1    0.055    0.055   22.647   22.647 speedtester.py:1(<module>)
 55540/30018    0.091    0.000   21.001    0.001 {len}
         1    0.000    0.000   20.967   20.967 transaction.py:206(inner)
         1    0.011    0.011   20.966   20.966 speedtester.py:24(managed)
       2/1    0.000    0.000   20.949   20.949 query.py:90(__iter__)
       3/2    0.002    0.001   20.949   10.474 query.py:75(__len__)
         1    0.000    0.000   20.824   20.824
 query.py:545(_prefetch_related_objects)
         1    0.002    0.002   20.824   20.824
 query.py:1534(prefetch_related_objects)
         1    0.026    0.026   20.819   20.819
 query.py:1627(_prefetch_one_level)
       999    9.552    0.010   18.146    0.018
 related.py:451(select_matching_instances)
 5011132/5010067    8.638    0.000    8.696    0.000 {getattr}
       999    0.021    0.000    2.016    0.002 related.py:457(all)
       999    0.005    0.000    1.992    0.002 manager.py:115(all)
       999    0.028    0.000    1.987    0.002
 related.py:434(get_query_set)
     11988    0.525    0.000    1.554    0.000 base.py:279(__init__)
      2005    0.027    0.000    1.392    0.001 query.py:837(_clone)
      2005    0.111    0.000    1.349    0.001 query.py:233(clone)

 }}}

-- 
Ticket URL: <https://code.djangoproject.com/ticket/16937#comment:12>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Re: [Django] #16937: Prefetch related objects

Reply via email to