Re: Implement QuerySet.contains?

Johan Schiff Wed, 03 Jun 2020 01:52:45 -0700

Thanks for great info.

First, I'm leaning towards Aymeric's proposition here. I do recognize that
there is a lot to consider.


This seems to be important:

   1. Developers must be able to explicitly choose methods to optimize for
   their environment. (Considering database latency, dataset size, prefetch
   preferred, etc.)
   2. Behaviour should preferably be consistent across methods. (len(qs),
   bool(qs), obj in qs)
   3. Syntax should be pythonic.

I think what Aymeric is proposing would fulfill those demands.

I have done some timeit-tests on a model with 100 000+ records, based on
Rogers work, on a single host setup using postgresql. My finding is that a
separate database query is generally faster than current *obj in qs* behaviour,
even on a prefetched queryset (unless your dataset is really small). This
tells me that we should prioritize .exists() query unless explicitly stated
(i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs)
and bool(qs) that should not be the case.

It would be interesting to get timings from other setups, so I'm including
the code I used. (Also, am I doing this correctly?)

import timeit

def time_fn(title, stmt, number=10000, prefetch=False):
    setup = [
        'from tickets.models import Ticket',
        'qs=Ticket.objects.all()',
        'obj_first = qs.first()',
        'obj_mid = qs[50000]',
        'obj_last = qs.last()',
    ]
    if prefetch:
        setup.append('list(qs)')
    result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number)
    print(f'{title}: {result_time}')

time_fn('Database fetch', 'list(qs)')
time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True)
time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True)
time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True)
time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk
).exists()')
time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()')
time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()')


My results:

Database fetch: 25.667968138001015
Prefetched obj_first in qs: 0.027538340998944477
Prefetched obj_mid in qs: 1051.1691511649988
Prefetched obj_last in qs: 2369.5217889660016
Database obj_first exists(): 6.442390248001175
Database obj_mid exists(): 6.213641551999899
Database obj_last exists(): 5.831252460000542


Le mar. 2 juin 2020 à 23:15, Roger Gammans <rgamm...@gammascience.co.uk> a
écrit :

> On Tue, 2020-06-02 at 11:31 -0700, Javier Buzzi wrote:
>
>
>
> ps @roger
>
> >>> timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import
> myapp.models as m;x = m.GL.objects.all()[324]', number=100)
> 0.05818330496549606
>
> is not doing anything, add a `.exists()` or `len(..)` or something to 
> evaluate the queryset. That number is WAY too low.
>
>
> Yes, of course. That makes me feel silly, on both counts - the evaluations
> and the number of iterations. I made number of iterations low;because I
> started with a test which a single iteration was taking multiple seconds
> and it that case the default number was way to high for the quick test I
> was offering. More critically the x in qs; depends on where in the qs the
> object is and it doesn't matter how big if you only look at the beginning.
>
> I reran it; but which shows different results.. See the following re run.
> Construction here is more complicated to show avg microsecs; per iteration,
> note the vary iteration lengths, actual runtime was aimed to be between
> 20secs and a minute or so.
>
> >>> # Searching for first in queryset
>
> >>> n=10_000_000 ;timeit.timeit('x in qs', setup='import myapp.models as m;qs 
> >>> = m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[0]', number=n)/(n/1_000_000)
>
> 0.4136358265765011
>
> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import 
> >>> myapp.models as m;x = list(m.GL.objects.all())[0]',  
> >>> number=n)/(n/1_000_000)
>
> 124.99445254914463
>
> >>> # Searching for last in queryset.
>
> >>> n=1000; timeit.timeit('x in qs', setup='import myapp.models as m;qs = 
> >>> m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[i-1]', number=n)/(n/1_000_000)
>
> 50741.098549216986
>
> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import 
> >>> myapp.models as m;x = list(m.GL.objects.all())[-1]',  
> >>> number=n)/(n/1_000_000)
>
> 118.99649207945913
>
>
>
>
> --
>
> Roger Gammans <rgamm...@gammascience.co.uk>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk
> <https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk?utm_medium=email&utm_source=footer>
> .
>


-- 
Vänligen, Johan Schiff
Radkompaniet AB
072-229 61 19

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD69rUA1E1Vfdm7kuv32r1Ppyw%3D7CCqCzfSW9itda88th2Uw3g%40mail.gmail.com.

Re: Implement QuerySet.__contains__?

Reply via email to

Re: Implement QuerySet.contains?