#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup unless necessary, for performance. -------------------------------------+------------------------------------- Reporter: Keryn | Owner: Keryn Knight Knight | Type: | Status: assigned Cleanup/optimization | Component: Database | Version: dev layer (models, ORM) | Severity: Normal | Keywords: Triage Stage: | Has patch: 1 Unreviewed | Needs documentation: 0 | Needs tests: 0 Patch needs improvement: 0 | Easy pickings: 0 UI/UX: 0 | -------------------------------------+------------------------------------- Currently, `build_lookup` contains: {{{ # For Oracle '' is equivalent to null. The check must be done at this # stage because join promotion can't be done in the compiler. Using # DEFAULT_DB_ALIAS isn't nice but it's the best that can be done here. # A similar thing is done in is_nullable(), too. if (connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls and lookup_name == 'exact' and lookup.rhs == ''): return lhs.get_lookup('isnull')(lhs, True) }}} but going through `connections` requires accessing the thread critical `asgiref.Local` which is only strictly necessary if the cheap comparisons are truthy and can be short-circuited like so: {{{ if (lookup_name == 'exact' and lookup.rhs == '' and connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls): return lhs.get_lookup('isnull')(lhs, True) }}}
This is coming off the back of the report in #33015, which is a nice pathological case we can use: {{{ In [1]: from django.db.models import Q In [2]: from testapp.models import ExampleModel In [3]: items_to_fetch = [(i, j) for i in range(0,100) for j in range(0,100)] In [4]: query = Q() ...: for v1, v2 in items_to_fetch: ...: query |= Q(val1=v1, val2=v2) %timeit x = ExampleModel.objects.filter(query) 644 ms ± 6.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) }}} If we cProfile that `ExampleModel.objects.filter(query)` we get something like: {{{ ncalls tottime percall cumtime percall filename:lineno(function) 30001/1 0.178 0.000 1.268 1.268 query.py:1221(build_filter) 40000 0.142 0.000 0.157 0.000 query.py:1457(names_to_path) 10002/1 0.077 0.000 1.268 1.268 query.py:1386(_add_q) 20000 0.046 0.000 0.379 0.000 query.py:1156(build_lookup) }}} and if we do `%lprun -f Query.build_filter -f Query.build_lookup ExampleModel.objects.filter(query)` on it we'll get (snipped for brevity): {{{ Function: build_filter at line 1221 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1329 20000 651989.0 32.6 24.9 condition = self.build_lookup(lookups, col, value) Function: build_lookup at line 1156 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1195 20000 200730.0 10.0 38.1 if (connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls and 1196 lookup_name == 'exact' and lookup.rhs == ''): 1197 return lhs.get_lookup('isnull')(lhs, True }}} Because I'm using sqlite, that 38% of time is essentially free to reclaim by switching the if condition as above, in which case we get cProfile output like so: {{{ ncalls tottime percall cumtime percall filename:lineno(function) 30001/1 0.171 0.000 1.082 1.082 query.py:1220(build_filter) 40000 0.145 0.000 0.159 0.000 query.py:1456(names_to_path) 10002/1 0.076 0.000 1.082 1.082 query.py:1385(_add_q) 20000 0.042 0.000 0.177 0.000 query.py:1559(setup_joins) 220005 0.039 0.000 0.064 0.000 {built-in method builtins.isinstance} 20000 0.037 0.000 0.223 0.000 query.py:1156(build_lookup) }}} and line profile information like so: {{{ Function: build_filter at line 1220 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1328 20000 447682.0 22.4 18.9 condition = self.build_lookup(lookups, col, value) Function: build_lookup at line 1156 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1195 20000 12744.0 0.6 3.9 if (lookup_name == 'exact' and lookup.rhs == '' and 1196 connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls): 1197 return lhs.get_lookup('isnull')(lhs, True }}} which ultimately changes the timed performance for non oracle backends to: {{{ In [5]: %timeit x = ExampleModel.objects.filter(query) 547 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) }}} (YMMV; it's being tested on a laptop and I've seen it ranging from 519ms to 600ms average.) Patch will be forthcoming as a PR shortly, to confirm it doesn't break CI. -- Ticket URL: <https://code.djangoproject.com/ticket/33025> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/052.e6e48a16690562614c7ee62bcad19ba6%40djangoproject.com.