Benchmarking change to query.get() #15361

mmcnickle Thu, 24 Feb 2011 06:59:20 -0800

Hi All,

Background to this post is available at 
http://code.djangoproject.com/ticket/15361


I've created a better benchmark in order to test where the change in
the above ticket causes a performance regression. These are the
results of those tests.

First of all, the results are based on query.get() on 20000 Book
objects, generated by the following code [1]

I ran the following 2 benchmarks query_get [2] and query_get_multiple
[3] on MySQL and sqlite. query_get is a simple get() using a unique
indexed column and will return one object only. query_get_multiple is
a get() on an non-indexed column, it will return 1284 objects.

The benchmark results, as reported by djangobench [4] are as follows:

sqlite:

-----------------------------------
Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)

Running 'query_get' benchmark ...
Min: 0.000000 -> 0.000000: incomparable (one result was zero)
Avg: 0.000745 -> 0.000979: 1.3141x slower
Significant (t=-5.900348)
Stddev: 0.00263 -> 0.00297: 1.1318x larger (N = 10000)

Running 'query_get_multiple' benchmark ...
Min: 0.020000 -> 0.000000: incomparable (one result was zero)
Avg: 0.029883 -> 0.001072: 27.8759x faster
Significant (t=482.259535)
Stddev: 0.00511 -> 0.00309: 1.6519x smaller (N = 10000)
-----------------------------------

mysql
-----------------------------------
Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)

Running 'query_get' benchmark ...
Min: 0.000000 -> 0.000000: incomparable (one result was zero)
Avg: 0.000810 -> 0.001039: 1.2827x slower
Significant (t=-5.591014)
Stddev: 0.00273 -> 0.00305: 1.1169x larger (N = 10000)

Running 'query_get_multiple' benchmark ...
Min: 0.020000 -> 0.000000: incomparable (one result was zero)
Avg: 0.028856 -> 0.001152: 25.0486x faster
Significant (t=429.695948)
Stddev: 0.00560 -> 0.00319: 1.7544x smaller (N = 10000)
-----------------------------------

As you can see, with 10000 trials, the speed differences are
significant (and repeatable) as are roughly as follows:

query.get() on a unique indexed column runs 1.3x slower
query.get() on an non-unique, non-index column runs 25-27x FASTER

I've done some very quick tests to see how these gains/losses scale
for various values of n objects:

The speedup for the non-indexed columns is exponential to n.
The slowdown for the indexed columns is roughly constant for all n.

So there you have it, we have a small regression in performance for
the most common case use, and a huge potential gain for the less used
(and some would argue, badly designed) query.

What do you think, is the gain worth the hit? Is it possible to have 2
different code paths depended on what column(s) the query is filtering
on?

-- Martin

P.S Between each trial, djangobench will try and reload the
initial_data.json fixture, which for 20000 objects is very time
consuming. If you want to reproduce the results yourself, I'd suggest
creating a database with the objects already in it, instead of relying
on fixtures.

----
[1] object generation script -- http://pastebin.com/6JAJDA6f
[2] query_get benchmark -- http://pastebin.com/qZBdvSie
[3] query_get_multiple benchmark -- http://pastebin.com/iEYsfmd5
[4] djangobench project (Luke's fork) -- 
https://github.com/spookylukey/djangobench

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Benchmarking change to query.get() #15361

Reply via email to