To answer my own question: No, I wasn't doing it correctly. I should have done a sanity check before posting.
New timeit code and results at bottom, now using a smaller dataset (~900 objects). Notable: - An .exists() query is about 1/100 the time of full fetch in this case. This difference would ofc be much bigger if I did it on the 100 000+ objects. - If already prefetched, it should be worth it to search prefetched data if the dataset is <700 objects. It seems rare to prefetch that many objects. If so, a developer could easily be explicit and use .exists() method. *Based on this exact scenario: 100 times slower for using obj in queryset is quite a performance hit. On the other hand, an extra 1 % query time for a queryset that is later evaluated is marginal.* If a developer knows they want to evaluate the queryset later, it would be nice to have a .evaluate() that does self._fetch_all() and returns self. I think that's preferable over list(queryset)-method, because we don't need to create an extra list. # Old explicit fetch method queryset = Stuff.objects.all() if obj in list(queryset): pass # My suggestion queryset = Stuff.objects.all().evaluate() if obj in queryset: pass Updated timeit function and results: import timeit def time_fn(title, stmt, number=10_000): setup = [ 'from movies.models import Movie', 'qs=Movie.objects.all()', 'obj_first = qs.first()', 'obj_mid = qs[300]', 'obj_last = qs.last()', 'list(qs)', ] result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number) print(f'{title}: {result_time}') time_fn('Database fetch', 'qs._result_cache=None\nlist(qs)') time_fn('Prefetched obj_first in qs', 'obj_first in qs') time_fn('Prefetched obj_mid in qs', 'obj_mid in qs') time_fn('Prefetched obj_last in qs', 'obj_last in qs') time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk ).exists()') time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()') time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()') # Results Database fetch: 616.097227364 Prefetched obj_first in qs: 0.030961711003328674 Prefetched obj_mid in qs: 6.6988333979970776 Prefetched obj_last in qs: 24.189914419999695 Database obj_first exists(): 6.468764332996216 Database obj_mid exists(): 6.167532913001196 Database obj_last exists(): 6.0190791100030765 Le mer. 3 juin 2020 à 10:52, Johan Schiff <jo...@radkompaniet.se> a écrit : > Thanks for great info. > > First, I'm leaning towards Aymeric's proposition here. I do recognize that > there is a lot to consider. > > This seems to be important: > > 1. Developers must be able to explicitly choose methods to optimize > for their environment. (Considering database latency, dataset size, > prefetch preferred, etc.) > 2. Behaviour should preferably be consistent across methods. (len(qs), > bool(qs), obj in qs) > 3. Syntax should be pythonic. > > I think what Aymeric is proposing would fulfill those demands. > > I have done some timeit-tests on a model with 100 000+ records, based on > Rogers work, on a single host setup using postgresql. My finding is that a > separate database query is generally faster than current *obj in qs* > behaviour, > even on a prefetched queryset (unless your dataset is really small). This > tells me that we should prioritize .exists() query unless explicitly stated > (i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs) > and bool(qs) that should not be the case. > > It would be interesting to get timings from other setups, so I'm including > the code I used. (Also, am I doing this correctly?) > > import timeit > > def time_fn(title, stmt, number=10000, prefetch=False): > setup = [ > 'from tickets.models import Ticket', > 'qs=Ticket.objects.all()', > 'obj_first = qs.first()', > 'obj_mid = qs[50000]', > 'obj_last = qs.last()', > ] > if prefetch: > setup.append('list(qs)') > result_time = timeit.timeit(stmt, setup='\n'.join(setup), > number=number) > print(f'{title}: {result_time}') > > time_fn('Database fetch', 'list(qs)') > time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True) > time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True) > time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True) > time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk > ).exists()') > time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()') > time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk > ).exists()') > > > My results: > > Database fetch: 25.667968138001015 > Prefetched obj_first in qs: 0.027538340998944477 > Prefetched obj_mid in qs: 1051.1691511649988 > Prefetched obj_last in qs: 2369.5217889660016 > Database obj_first exists(): 6.442390248001175 > Database obj_mid exists(): 6.213641551999899 > Database obj_last exists(): 5.831252460000542 > > > Le mar. 2 juin 2020 à 23:15, Roger Gammans <rgamm...@gammascience.co.uk> > a écrit : > >> On Tue, 2020-06-02 at 11:31 -0700, Javier Buzzi wrote: >> >> >> >> ps @roger >> >> >>> timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import >> myapp.models as m;x = m.GL.objects.all()[324]', number=100) >> 0.05818330496549606 >> >> is not doing anything, add a `.exists()` or `len(..)` or something to >> evaluate the queryset. That number is WAY too low. >> >> >> Yes, of course. That makes me feel silly, on both counts - the >> evaluations and the number of iterations. I made number of iterations >> low;because I started with a test which a single iteration was taking >> multiple seconds and it that case the default number was way to high for >> the quick test I was offering. More critically the x in qs; depends on >> where in the qs the object is and it doesn't matter how big if you only >> look at the beginning. >> >> I reran it; but which shows different results.. See the following re run. >> Construction here is more complicated to show avg microsecs; per iteration, >> note the vary iteration lengths, actual runtime was aimed to be between >> 20secs and a minute or so. >> >> >>> # Searching for first in queryset >> >> >>> n=10_000_000 ;timeit.timeit('x in qs', setup='import myapp.models as >> >>> m;qs = m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[0]', >> >>> number=n)/(n/1_000_000) >> >> 0.4136358265765011 >> >> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', >> >>> setup='import myapp.models as m;x = list(m.GL.objects.all())[0]', >> >>> number=n)/(n/1_000_000) >> >> 124.99445254914463 >> >> >>> # Searching for last in queryset. >> >> >>> n=1000; timeit.timeit('x in qs', setup='import myapp.models as m;qs = >> >>> m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[i-1]', number=n)/(n/1_000_000) >> >> 50741.098549216986 >> >> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', >> >>> setup='import myapp.models as m;x = list(m.GL.objects.all())[-1]', >> >>> number=n)/(n/1_000_000) >> >> 118.99649207945913 >> >> >> >> >> -- >> >> Roger Gammans <rgamm...@gammascience.co.uk> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Django developers (Contributions to Django itself)" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> django-developers+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk >> <https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk?utm_medium=email&utm_source=footer> >> . >> > > > -- > Vänligen, Johan Schiff > Radkompaniet AB > 072-229 61 19 > -- Vänligen, Johan Schiff Radkompaniet AB 072-229 61 19 -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD69rUDECkftJxchNM3pxMnLw4gdzq3BWg0JLx5bt4gZRwg0rQ%40mail.gmail.com.