#32996: Cache PathInfos on ForeignObject/ForeignObjectRel (et al)
-------------------------------------+-------------------------------------
               Reporter:  Keryn      |          Owner:  Keryn Knight
  Knight                             |
                   Type:             |         Status:  assigned
  Cleanup/optimization               |
              Component:  Database   |        Version:  dev
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 In most usages of `get_path_info` (across it's various implementations)
 the `filtered_relation` parameter isn't used. The places it does
 potentially get given a filtered relation look to be (cursory sweep):
 - `django.db.models.sql.query.Query.names_to_path`
 - `django.db.models.fields.related.ManyToManyField._get_path_info`
 Whenever `get_path_info()` is used (that is, without giving any arguments)
 it's essentially a fixed result based on a few attributes of the parent
 object (the `ForeignObject` or whatever), which are permanently held
 across the lifetime of the process AFAIK (that is, eg:
 `id(User.groups.rel)` is stable)

 So calling `get_path_info` everywhere when there's no filtered relation
 being used is expensive:
 {{{
 In [2]: %timeit User.groups.rel.get_path_info()
 2.27 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each
 }}}

 By defining additionally (and the opposite for places where
 `get_reverse_path_info` is defined):
 {{{
 @cached_property
     def path_infos(self):
         return self.get_path_info()
 }}}
 we can amortize that cost over the life of the process such that:
 {{{
 In [3]: %timeit User.groups.rel.path_infos
 164 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each
 }}}

 Thus, given 101 Users, each with 1 user permission, and 1 group (also with
 single permission) as outlined here:
 {{{
 perms = tuple(Permission.objects.all())
 for x in range(n):
     y = get_random_string(10)
     user = User.objects.create(username=y, email=f'{y}@{y}', password=y)
     random_group_perm = random.choice(perms)
     random_user_perm = random.choice(perms)
     user.user_permissions.add(random_user_perm)
     group = Group.objects.create(name=y)
     group.permissions.add(random_group_perm)
     user.groups.add(group)
 }}}
 we can execute the following before changing over to using a cached
 property:
 {{{
 In [2]: %prun -stime -lpath_info for _ in range(100):
 tuple(User.objects.prefetch_related('groups', 'user_permissions',
 'groups__permissions'))
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  31500    0.033    0.000    0.048    0.000 related.py:728(get_path_info)
    600    0.002    0.000    0.009    0.000 related.py:1545(_get_path_info)
    600    0.002    0.000    0.003    0.000
 related.py:920(get_reverse_path_info)
    600    0.001    0.000    0.009    0.000
 related.py:1574(get_reverse_path_info)
    600    0.000    0.000    0.010    0.000
 reverse_related.py:197(get_path_info)
 }}}
 and then after:
 {{{
 Ordered by: internal time
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3    0.000    0.000    0.000    0.000 related.py:1559(_get_path_info)
      3    0.000    0.000    0.000    0.000
 related.py:934(get_reverse_path_info)
      4    0.000    0.000    0.000    0.000 related.py:734(get_path_info)
      3    0.000    0.000    0.000    0.000
 related.py:1592(get_reverse_path_info)
      3    0.000    0.000    0.000    0.000
 reverse_related.py:207(get_path_info)
      4    0.000    0.000    0.000    0.000 related.py:748(path_infos)
      3    0.000    0.000    0.000    0.000
 related.py:766(reverse_path_infos)
      3    0.000    0.000    0.000    0.000
 reverse_related.py:210(path_infos)
 }}}
 Note that those `ncalls` aren't a ''true'' reflection, because after the
 cached property replaces itself subsequent attribute accesses won't be
 tracked, hence the `timeit` information above.

 Across the test suite, beforehand:
 {{{
 > python -m cProfile runtests.py -v0 | grep "path_info"
 ...
    211111    0.227    0.000    0.335    0.000
 related.py:728(get_path_info)
      5349    0.021    0.000    0.073    0.000
 related.py:1545(_get_path_info)
      6804    0.019    0.000    0.027    0.000
 related.py:920(get_reverse_path_info)
      3469    0.006    0.000    0.013    0.000 wsgi.py:152(get_path_info)
 Unrelated
      5568    0.005    0.000    0.068    0.000
 reverse_related.py:197(get_path_info)
      3969    0.004    0.000    0.058    0.000
 related.py:1574(get_reverse_path_info)
      1380    0.001    0.000    0.021    0.000
 related.py:1571(get_path_info)
       124    0.000    0.000    0.001    0.000 fields.py:399(get_path_info)
       113    0.000    0.000    0.000    0.000
 related.py:742(get_reverse_path_info)
        27    0.000    0.000    0.000    0.000
 fields.py:416(get_reverse_path_info)
         6    0.000    0.000    0.000    0.000
 fields.py:359(_get_path_info_with_parent)
        15    0.000    0.000    0.000    0.000
 empty_join.py:54(get_path_info)
        10    0.000    0.000    0.000    0.000
 empty_join.py:67(get_reverse_path_info)
         1    0.000    0.000    0.002    0.002
 tests.py:23(test_bad_path_info            Unrelated
         1    0.000    0.000    0.000    0.000
 tests.py:196(test_environ_path_info_type) Unrelated
         1    0.000    0.000    0.000    0.000
 tests.py:164(test_wsgirequest_path_info)  Unrelated
         2    0.000    0.000    0.000    0.000
 request.py:148(get_full_path_info)        Unrelated
 }}}
 and after switching to a cached property:
 {{{
 > python -m cProfile runtests.py -v0 | grep "path_info"
 ...
      3469    0.006    0.000    0.013    0.000 wsgi.py:152(get_path_info)
 Unrelated
       873    0.003    0.000    0.004    0.000
 related.py:734(get_path_info)
       396    0.002    0.000    0.003    0.000
 related.py:934(get_reverse_path_info)
       203    0.001    0.000    0.006    0.000
 related.py:1559(_get_path_info)
       843    0.001    0.000    0.004    0.000 related.py:748(path_infos)
       342    0.001    0.000    0.006    0.000
 reverse_related.py:207(get_path_info)
       256    0.000    0.000    0.006    0.000
 reverse_related.py:210(path_infos)
       196    0.000    0.000    0.002    0.000
 related.py:766(reverse_path_infos)
       130    0.000    0.000    0.004    0.000
 related.py:1592(get_reverse_path_info)
        73    0.000    0.000    0.002    0.000
 related.py:1585(get_path_info)
        17    0.000    0.000    0.000    0.000 fields.py:399(get_path_info)
        67    0.000    0.000    0.002    0.000 related.py:1588(path_infos)
         7    0.000    0.000    0.000    0.000
 related.py:752(get_reverse_path_info)
        15    0.000    0.000    0.000    0.000 fields.py:416(path_infos)
         4    0.000    0.000    0.000    0.000
 fields.py:420(get_reverse_path_info)
         1    0.000    0.000    0.001    0.001
 tests.py:23(test_bad_path_info)           Unrelated
         2    0.000    0.000    0.000    0.000
 fields.py:359(_get_path_info_with_parent)
         1    0.000    0.000    0.000    0.000
 tests.py:196(test_environ_path_info_type) Unrelated
         2    0.000    0.000    0.000    0.000
 empty_join.py:54(get_path_info)
         1    0.000    0.000    0.000    0.000
 tests.py:164(test_wsgirequest_path_info)  Unrelated
         1    0.000    0.000    0.000    0.000
 empty_join.py:67(get_reverse_path_info)
         2    0.000    0.000    0.000    0.000
 request.py:148(get_full_path_info)        Unrelated
 }}}

 As you can see, even within the test suite there are a substantial number
 of calls to effectively get the same information, across the many and
 various test models. Obviously if you look at the timing numbers, it's all
 kind of insubstantial. It's really just work that doesn't need repeatedly
 doing, I don't think.

 I have a branch which I'll tidy up and push to check CI and allow for
 discussion, should it be accepted.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32996>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.d906a57b8565aa8eec488d083b3c5bae%40djangoproject.com.

Reply via email to