#33015: QuerySet.extra Use Case: filtering on large lists of tuples
-------------------------------------+-------------------------------------
     Reporter:  kris-swann           |                    Owner:  nobody
         Type:  Uncategorized        |                   Status:  new
    Component:  Database layer       |                  Version:  2.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  QuerySet.extra       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

 * cc: Keryn Knight (added)


Comment:

 Having been bitten by this same thing multiple times over the years, I'm
 interested.

 The performance profile of this is reproducible for me on the `2.2.9` and
 `3.2.6` tags:
 {{{
 In [5]: %%timeit
    ...: query = Q()
    ...: for v1, v2 in items_to_fetch:
    ...:     query |= Q(val1=v1, val2=v2)
 21.9 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
 In [7]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
 48.2 s ± 1.54 s per loop (mean ± std. dev. of 3 runs, 1 loop each)
 }}}
 but doesn't appear to be a ''big'' issue on `main` since #32940; my
 suspicion is that both `WhereNode` and `Q` were hitting the pathological
 case for `data in self.children` being O(n), in case that rings a bell
 Simon; below is main as of `8208381ba6`; the problem was present at
 `501a8db465` and basically gone at `ff661dbd50`:
 {{{
 In [5]: %%timeit
    ...: query = Q()
    ...: for v1, v2 in items_to_fetch:
    ...:     query |= Q(val1=v1, val2=v2)
 209 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 In [7]: %timeit x = ExampleModel.objects.filter(query)
 644 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 }}}
 Though it's still faster to avoid combining Q objects and kwargs:
 {{{
 In [14]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
 items_to_fetch), _connector=Q.OR)
 15.1 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 }}}
 That's true for `2.2.9` too, but regrettably has less impact on the query
 compilation itself:
 {{{
 In [4]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
 items_to_fetch), _connector=Q.OR)
 22.9 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
 In [5]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
 20.6 s ± 874 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
 }}}

 I don't know that the simple Q based version will ever be as fast as
 `extra` (as used above) or the `ExpressionTuple` (also above), but it
 begins to look ''almost'' acceptable, certainly compared to the previous.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33015#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.7d3d0b1a2d3d23558ba619a9aa3218a8%40djangoproject.com.

Reply via email to