#32996: Cache PathInfos on ForeignObject/ForeignObjectRel (et al) -------------------------------------+------------------------------------- Reporter: Keryn | Owner: Keryn Knight Knight | Type: | Status: assigned Cleanup/optimization | Component: Database | Version: dev layer (models, ORM) | Severity: Normal | Keywords: Triage Stage: | Has patch: 0 Unreviewed | Needs documentation: 0 | Needs tests: 0 Patch needs improvement: 0 | Easy pickings: 0 UI/UX: 0 | -------------------------------------+------------------------------------- In most usages of `get_path_info` (across it's various implementations) the `filtered_relation` parameter isn't used. The places it does potentially get given a filtered relation look to be (cursory sweep): - `django.db.models.sql.query.Query.names_to_path` - `django.db.models.fields.related.ManyToManyField._get_path_info` Whenever `get_path_info()` is used (that is, without giving any arguments) it's essentially a fixed result based on a few attributes of the parent object (the `ForeignObject` or whatever), which are permanently held across the lifetime of the process AFAIK (that is, eg: `id(User.groups.rel)` is stable)
So calling `get_path_info` everywhere when there's no filtered relation being used is expensive: {{{ In [2]: %timeit User.groups.rel.get_path_info() 2.27 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each }}} By defining additionally (and the opposite for places where `get_reverse_path_info` is defined): {{{ @cached_property def path_infos(self): return self.get_path_info() }}} we can amortize that cost over the life of the process such that: {{{ In [3]: %timeit User.groups.rel.path_infos 164 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each }}} Thus, given 101 Users, each with 1 user permission, and 1 group (also with single permission) as outlined here: {{{ perms = tuple(Permission.objects.all()) for x in range(n): y = get_random_string(10) user = User.objects.create(username=y, email=f'{y}@{y}', password=y) random_group_perm = random.choice(perms) random_user_perm = random.choice(perms) user.user_permissions.add(random_user_perm) group = Group.objects.create(name=y) group.permissions.add(random_group_perm) user.groups.add(group) }}} we can execute the following before changing over to using a cached property: {{{ In [2]: %prun -stime -lpath_info for _ in range(100): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions')) ncalls tottime percall cumtime percall filename:lineno(function) 31500 0.033 0.000 0.048 0.000 related.py:728(get_path_info) 600 0.002 0.000 0.009 0.000 related.py:1545(_get_path_info) 600 0.002 0.000 0.003 0.000 related.py:920(get_reverse_path_info) 600 0.001 0.000 0.009 0.000 related.py:1574(get_reverse_path_info) 600 0.000 0.000 0.010 0.000 reverse_related.py:197(get_path_info) }}} and then after: {{{ Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 3 0.000 0.000 0.000 0.000 related.py:1559(_get_path_info) 3 0.000 0.000 0.000 0.000 related.py:934(get_reverse_path_info) 4 0.000 0.000 0.000 0.000 related.py:734(get_path_info) 3 0.000 0.000 0.000 0.000 related.py:1592(get_reverse_path_info) 3 0.000 0.000 0.000 0.000 reverse_related.py:207(get_path_info) 4 0.000 0.000 0.000 0.000 related.py:748(path_infos) 3 0.000 0.000 0.000 0.000 related.py:766(reverse_path_infos) 3 0.000 0.000 0.000 0.000 reverse_related.py:210(path_infos) }}} Note that those `ncalls` aren't a ''true'' reflection, because after the cached property replaces itself subsequent attribute accesses won't be tracked, hence the `timeit` information above. Across the test suite, beforehand: {{{ > python -m cProfile runtests.py -v0 | grep "path_info" ... 211111 0.227 0.000 0.335 0.000 related.py:728(get_path_info) 5349 0.021 0.000 0.073 0.000 related.py:1545(_get_path_info) 6804 0.019 0.000 0.027 0.000 related.py:920(get_reverse_path_info) 3469 0.006 0.000 0.013 0.000 wsgi.py:152(get_path_info) Unrelated 5568 0.005 0.000 0.068 0.000 reverse_related.py:197(get_path_info) 3969 0.004 0.000 0.058 0.000 related.py:1574(get_reverse_path_info) 1380 0.001 0.000 0.021 0.000 related.py:1571(get_path_info) 124 0.000 0.000 0.001 0.000 fields.py:399(get_path_info) 113 0.000 0.000 0.000 0.000 related.py:742(get_reverse_path_info) 27 0.000 0.000 0.000 0.000 fields.py:416(get_reverse_path_info) 6 0.000 0.000 0.000 0.000 fields.py:359(_get_path_info_with_parent) 15 0.000 0.000 0.000 0.000 empty_join.py:54(get_path_info) 10 0.000 0.000 0.000 0.000 empty_join.py:67(get_reverse_path_info) 1 0.000 0.000 0.002 0.002 tests.py:23(test_bad_path_info Unrelated 1 0.000 0.000 0.000 0.000 tests.py:196(test_environ_path_info_type) Unrelated 1 0.000 0.000 0.000 0.000 tests.py:164(test_wsgirequest_path_info) Unrelated 2 0.000 0.000 0.000 0.000 request.py:148(get_full_path_info) Unrelated }}} and after switching to a cached property: {{{ > python -m cProfile runtests.py -v0 | grep "path_info" ... 3469 0.006 0.000 0.013 0.000 wsgi.py:152(get_path_info) Unrelated 873 0.003 0.000 0.004 0.000 related.py:734(get_path_info) 396 0.002 0.000 0.003 0.000 related.py:934(get_reverse_path_info) 203 0.001 0.000 0.006 0.000 related.py:1559(_get_path_info) 843 0.001 0.000 0.004 0.000 related.py:748(path_infos) 342 0.001 0.000 0.006 0.000 reverse_related.py:207(get_path_info) 256 0.000 0.000 0.006 0.000 reverse_related.py:210(path_infos) 196 0.000 0.000 0.002 0.000 related.py:766(reverse_path_infos) 130 0.000 0.000 0.004 0.000 related.py:1592(get_reverse_path_info) 73 0.000 0.000 0.002 0.000 related.py:1585(get_path_info) 17 0.000 0.000 0.000 0.000 fields.py:399(get_path_info) 67 0.000 0.000 0.002 0.000 related.py:1588(path_infos) 7 0.000 0.000 0.000 0.000 related.py:752(get_reverse_path_info) 15 0.000 0.000 0.000 0.000 fields.py:416(path_infos) 4 0.000 0.000 0.000 0.000 fields.py:420(get_reverse_path_info) 1 0.000 0.000 0.001 0.001 tests.py:23(test_bad_path_info) Unrelated 2 0.000 0.000 0.000 0.000 fields.py:359(_get_path_info_with_parent) 1 0.000 0.000 0.000 0.000 tests.py:196(test_environ_path_info_type) Unrelated 2 0.000 0.000 0.000 0.000 empty_join.py:54(get_path_info) 1 0.000 0.000 0.000 0.000 tests.py:164(test_wsgirequest_path_info) Unrelated 1 0.000 0.000 0.000 0.000 empty_join.py:67(get_reverse_path_info) 2 0.000 0.000 0.000 0.000 request.py:148(get_full_path_info) Unrelated }}} As you can see, even within the test suite there are a substantial number of calls to effectively get the same information, across the many and various test models. Obviously if you look at the timing numbers, it's all kind of insubstantial. It's really just work that doesn't need repeatedly doing, I don't think. I have a branch which I'll tidy up and push to check CI and allow for discussion, should it be accepted. -- Ticket URL: <https://code.djangoproject.com/ticket/32996> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/052.d906a57b8565aa8eec488d083b3c5bae%40djangoproject.com.