Hi All, Background to this post is available at http://code.djangoproject.com/ticket/15361
I've created a better benchmark in order to test where the change in the above ticket causes a performance regression. These are the results of those tests. First of all, the results are based on query.get() on 20000 Book objects, generated by the following code [1] I ran the following 2 benchmarks query_get [2] and query_get_multiple [3] on MySQL and sqlite. query_get is a simple get() using a unique indexed column and will return one object only. query_get_multiple is a get() on an non-indexed column, it will return 1284 objects. The benchmark results, as reported by djangobench [4] are as follows: sqlite: ----------------------------------- Running all benchmarks Control: Django 1.3 beta 1 (in django-control) Experiment: Django 1.3 beta 1 (in django-experiment) Running 'query_get' benchmark ... Min: 0.000000 -> 0.000000: incomparable (one result was zero) Avg: 0.000745 -> 0.000979: 1.3141x slower Significant (t=-5.900348) Stddev: 0.00263 -> 0.00297: 1.1318x larger (N = 10000) Running 'query_get_multiple' benchmark ... Min: 0.020000 -> 0.000000: incomparable (one result was zero) Avg: 0.029883 -> 0.001072: 27.8759x faster Significant (t=482.259535) Stddev: 0.00511 -> 0.00309: 1.6519x smaller (N = 10000) ----------------------------------- mysql ----------------------------------- Running all benchmarks Control: Django 1.3 beta 1 (in django-control) Experiment: Django 1.3 beta 1 (in django-experiment) Running 'query_get' benchmark ... Min: 0.000000 -> 0.000000: incomparable (one result was zero) Avg: 0.000810 -> 0.001039: 1.2827x slower Significant (t=-5.591014) Stddev: 0.00273 -> 0.00305: 1.1169x larger (N = 10000) Running 'query_get_multiple' benchmark ... Min: 0.020000 -> 0.000000: incomparable (one result was zero) Avg: 0.028856 -> 0.001152: 25.0486x faster Significant (t=429.695948) Stddev: 0.00560 -> 0.00319: 1.7544x smaller (N = 10000) ----------------------------------- As you can see, with 10000 trials, the speed differences are significant (and repeatable) as are roughly as follows: query.get() on a unique indexed column runs 1.3x slower query.get() on an non-unique, non-index column runs 25-27x FASTER I've done some very quick tests to see how these gains/losses scale for various values of n objects: The speedup for the non-indexed columns is exponential to n. The slowdown for the indexed columns is roughly constant for all n. So there you have it, we have a small regression in performance for the most common case use, and a huge potential gain for the less used (and some would argue, badly designed) query. What do you think, is the gain worth the hit? Is it possible to have 2 different code paths depended on what column(s) the query is filtering on? -- Martin P.S Between each trial, djangobench will try and reload the initial_data.json fixture, which for 20000 objects is very time consuming. If you want to reproduce the results yourself, I'd suggest creating a database with the objects already in it, instead of relying on fixtures. ---- [1] object generation script -- http://pastebin.com/6JAJDA6f [2] query_get benchmark -- http://pastebin.com/qZBdvSie [3] query_get_multiple benchmark -- http://pastebin.com/iEYsfmd5 [4] djangobench project (Luke's fork) -- https://github.com/spookylukey/djangobench -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
