#17025: sql/where.py WhereNode refactoring
----------------------------------------------+-------------------------
Reporter: akaariai | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Database layer (models, ORM) | Version: 1.3
Severity: Normal | Keywords: orm cleanup
Triage Stage: Unreviewed | Has patch: 1
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+-------------------------
First the good results: running tests is now about 10% faster. This is
purely because SQL generation is faster, no test runner changes. I also
ran theses 3 tests:
- 1000 times User.objects.filter(pk=1)
- 200 times qs = qs.filter(pk=1) That is, chained queryset cloning test
- 1000 times, in a transaction, fetch an object and save it.
Results against PostgreSQL on localhost:
Patched:
- 0:00:00.232005
- 0:00:00.053464
- 0:00:02.188870
Master:
- 0:00:00.375182
- 0:00:02.226373
- 0:00:03.722031
So, about 30% faster, nearly 50x faster, and about 40% faster. There is a
attached tests.txt file which describes the tests, and gives some results.
This ticket is related to the add_q refactoring (#17000). There are 3
different goals here:
- make the code cleaner and easier to understand
- make the code more correct
- make the code faster
So far I think I have managed to simplify the sql/query.py wherenode
handling, and sql/where.py also. There is much better separation of where
things are done. Due to some problems with aggregates, currently the
compiler.py is not any cleaner, possibly even the opposite.
The attached patch is WIP. There might be stale comments, and there is
some rambling about various things in comments. I wont have time to work
on this any more just now, so I decided to post the patch for testing and
benchmarking. It does pass all the test. I believe it can handle queries
which the trunk version can't but I haven't tested, so nothing more about
those.
At the end some bad news: Constructing SQL for simple queries is still too
slow. Here is a profile of fetching 1000 objects by
User.objects.get(pk=1).
{{{
2.503 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
1001 0.224 0.000 0.243 0.000 {method 'execute' of
'psycopg2._psycopg.cursor' objects}
2004 0.093 0.000 0.247 0.000 query.py:274(clone)
22679 0.070 0.000 0.070 0.000 {hasattr}
6183 0.053 0.000 0.078 0.000
datastructures.py:108(__init__)
30238/22735 0.046 0.000 1.256 0.000 {len}
1000 0.044 0.000 0.465 0.000 compiler.py:80(as_sql)
2000 0.043 0.000 1.200 0.001 query.py:230(iterator)
22323 0.041 0.000 0.041 0.000 {isinstance}
474/99 0.040 0.000 0.154 0.002 sre_parse.py:385(_parse)
1000 0.038 0.000 0.096 0.000
compiler.py:279(get_default_columns)
1000 0.038 0.000 0.327 0.000 util.py:31(execute)
24678 0.037 0.000 0.037 0.000 {method 'append' of 'list'
objects}
1000 0.036 0.000 0.085 0.000 base.py:279(__init__)
1000 0.033 0.000 0.137 0.000 where.py:71(make_atom)
1000 0.032 0.000 0.301 0.000 query.py:1049(add_filter)
}}}
Without profiling the time to run the test is about 1 second, while using
raw SQL yields 0.2 second timing.
What does this tell us? There is still a lot of room for improvement, but
there is no single source of time loss. There are however a lot of small
things we can improve somewhat easily. But more of those later.
Now I would like:
- To get some realword benchmarks / tests. Does it break, is it faster
for real world queries?
- To get some feedback from the Django core developers who understand
the ORM. I am willing to work on cleaning up the ORM, but I need some
confirmation that the direction I took is acceptable.
- It is not yet time to review the patch, but please take a look and say
what you think. I would recommend to start from sql/query.py add_q and
compare it to the trunk version.
Sorry that the patch is so large. The problem is, in the ORM most things
are connected to each other, so when you change one part, it easily
cascades to more things. I stopped at aggregation refactor for now.
Thank you for your time!
--
Ticket URL: <https://code.djangoproject.com/ticket/17025>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.