#17025: sql/where.py WhereNode refactoring
----------------------------------------------+-------------------------
     Reporter:  akaariai                      |      Owner:  nobody
         Type:  Cleanup/optimization          |     Status:  new
    Component:  Database layer (models, ORM)  |    Version:  1.3
     Severity:  Normal                        |   Keywords:  orm cleanup
 Triage Stage:  Unreviewed                    |  Has patch:  1
Easy pickings:  0                             |      UI/UX:  0
----------------------------------------------+-------------------------
 First the good results: running tests is now about 10% faster. This is
 purely because SQL generation is faster, no test runner changes. I also
 ran theses 3 tests:
   - 1000 times User.objects.filter(pk=1)
   - 200 times qs = qs.filter(pk=1) That is, chained queryset cloning test
   - 1000 times, in a transaction, fetch an object and save it.

 Results against PostgreSQL on localhost:

 Patched:
  - 0:00:00.232005
  - 0:00:00.053464
  - 0:00:02.188870

 Master:
  - 0:00:00.375182
  - 0:00:02.226373
  - 0:00:03.722031

 So, about 30% faster, nearly 50x faster, and about 40% faster. There is a
 attached tests.txt file which describes the tests, and gives some results.

 This ticket is related to the add_q refactoring (#17000). There are 3
 different goals here:
   - make the code cleaner and easier to understand
   - make the code more correct
   - make the code faster

 So far I think I have managed to simplify the sql/query.py wherenode
 handling, and sql/where.py also. There is much better separation of where
 things are done. Due to some problems with aggregates, currently the
 compiler.py is not any cleaner, possibly even the opposite.

 The attached patch is WIP. There might be stale comments, and there is
 some rambling about various things in comments. I wont have time to work
 on this any more just now, so I decided to post the patch for testing and
 benchmarking. It does pass all the test. I believe it can handle queries
 which the trunk version can't but I haven't tested, so nothing more about
 those.

 At the end some bad news: Constructing SQL for simple queries is still too
 slow. Here is a profile of fetching 1000 objects by
 User.objects.get(pk=1).

 {{{
 2.503 CPU seconds
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1001    0.224    0.000    0.243    0.000 {method 'execute' of
 'psycopg2._psycopg.cursor' objects}
      2004    0.093    0.000    0.247    0.000 query.py:274(clone)
     22679    0.070    0.000    0.070    0.000 {hasattr}
      6183    0.053    0.000    0.078    0.000
 datastructures.py:108(__init__)
 30238/22735    0.046    0.000    1.256    0.000 {len}
      1000    0.044    0.000    0.465    0.000 compiler.py:80(as_sql)
      2000    0.043    0.000    1.200    0.001 query.py:230(iterator)
     22323    0.041    0.000    0.041    0.000 {isinstance}
    474/99    0.040    0.000    0.154    0.002 sre_parse.py:385(_parse)
      1000    0.038    0.000    0.096    0.000
 compiler.py:279(get_default_columns)
      1000    0.038    0.000    0.327    0.000 util.py:31(execute)
     24678    0.037    0.000    0.037    0.000 {method 'append' of 'list'
 objects}
      1000    0.036    0.000    0.085    0.000 base.py:279(__init__)
      1000    0.033    0.000    0.137    0.000 where.py:71(make_atom)
      1000    0.032    0.000    0.301    0.000 query.py:1049(add_filter)

 }}}

 Without profiling the time to run the test is about 1 second, while using
 raw SQL yields 0.2 second timing.

 What does this tell us? There is still a lot of room for improvement, but
 there is no single source of time loss. There are however a lot of small
 things we can improve somewhat easily. But more of those later.

 Now I would like:
   - To get some realword benchmarks / tests. Does it break, is it faster
 for real world queries?
   - To get some feedback from the Django core developers who understand
 the ORM. I am willing to work on cleaning up the ORM, but I need some
 confirmation that the direction I took is acceptable.
   - It is not yet time to review the patch, but please take a look and say
 what you think. I would recommend to start from sql/query.py add_q and
 compare it to the trunk version.

 Sorry that the patch is so large. The problem is, in the ORM most things
 are connected to each other, so when you change one part, it easily
 cascades to more things. I stopped at aggregation refactor for now.

 Thank you for your time!

-- 
Ticket URL: <https://code.djangoproject.com/ticket/17025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to