#33025: Avoid accessing ConnectionsHandler.__getitem__ in Query.build_lookup 
unless
necessary, for performance.
-------------------------------------+-------------------------------------
               Reporter:  Keryn      |          Owner:  Keryn Knight
  Knight                             |
                   Type:             |         Status:  assigned
  Cleanup/optimization               |
              Component:  Database   |        Version:  dev
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  1
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Currently, `build_lookup` contains:
 {{{
 # For Oracle '' is equivalent to null. The check must be done at this
 # stage because join promotion can't be done in the compiler. Using
 # DEFAULT_DB_ALIAS isn't nice but it's the best that can be done here.
 # A similar thing is done in is_nullable(), too.
 if
 (connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
 and
         lookup_name == 'exact' and lookup.rhs == ''):
     return lhs.get_lookup('isnull')(lhs, True)
 }}}
 but going through `connections` requires accessing the thread critical
 `asgiref.Local` which is only strictly necessary if the cheap comparisons
 are truthy and can be short-circuited like so:
 {{{
 if (lookup_name == 'exact' and lookup.rhs == '' and
 connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
     return lhs.get_lookup('isnull')(lhs, True)
 }}}

 This is coming off the back of the report in #33015, which is a nice
 pathological case we can use:
 {{{
 In [1]: from django.db.models import Q
 In [2]: from testapp.models import ExampleModel
 In [3]: items_to_fetch = [(i, j) for i in range(0,100) for j in
 range(0,100)]
 In [4]: query = Q()
    ...: for v1, v2 in items_to_fetch:
    ...:     query |= Q(val1=v1, val2=v2)
 %timeit x = ExampleModel.objects.filter(query)
 644 ms ± 6.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 }}}
 If we cProfile that `ExampleModel.objects.filter(query)` we get something
 like:
 {{{
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   30001/1    0.178    0.000    1.268    1.268 query.py:1221(build_filter)
     40000    0.142    0.000    0.157    0.000 query.py:1457(names_to_path)
   10002/1    0.077    0.000    1.268    1.268 query.py:1386(_add_q)
     20000    0.046    0.000    0.379    0.000 query.py:1156(build_lookup)
 }}}
 and if we do `%lprun -f Query.build_filter -f Query.build_lookup
 ExampleModel.objects.filter(query)` on it we'll get (snipped for brevity):
 {{{
 Function: build_filter at line 1221
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1329     20000     651989.0     32.6     24.9          condition =
 self.build_lookup(lookups, col, value)

 Function: build_lookup at line 1156
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1195     20000     200730.0     10.0     38.1          if
 (connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls
 and
   1196
 lookup_name == 'exact' and lookup.rhs == ''):
   1197                                                       return
 lhs.get_lookup('isnull')(lhs, True
 }}}
 Because I'm using sqlite, that 38% of time is essentially free to reclaim
 by switching the if condition as above, in which case we get cProfile
 output like so:
 {{{
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   30001/1    0.171    0.000    1.082    1.082 query.py:1220(build_filter)
     40000    0.145    0.000    0.159    0.000 query.py:1456(names_to_path)
   10002/1    0.076    0.000    1.082    1.082 query.py:1385(_add_q)
     20000    0.042    0.000    0.177    0.000 query.py:1559(setup_joins)
    220005    0.039    0.000    0.064    0.000 {built-in method
 builtins.isinstance}
     20000    0.037    0.000    0.223    0.000 query.py:1156(build_lookup)
 }}}
 and line profile information like so:
 {{{
 Function: build_filter at line 1220
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1328     20000     447682.0     22.4     18.9          condition =
 self.build_lookup(lookups, col, value)

 Function: build_lookup at line 1156
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
   1195     20000      12744.0      0.6      3.9          if (lookup_name
 == 'exact' and lookup.rhs == '' and
   1196
 connections[DEFAULT_DB_ALIAS].features.interprets_empty_strings_as_nulls):
   1197                                                       return
 lhs.get_lookup('isnull')(lhs, True
 }}}

 which ultimately changes the timed performance for non oracle backends to:
 {{{
 In [5]: %timeit x = ExampleModel.objects.filter(query)
 547 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 }}}
 (YMMV; it's being tested on a laptop and I've seen it ranging from 519ms
 to 600ms average.)

 Patch will be forthcoming as a PR shortly, to confirm it doesn't break CI.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.e6e48a16690562614c7ee62bcad19ba6%40djangoproject.com.

Reply via email to