Привет Djanonaughts!

tl;dr: django with buildout spends a lot of time looking for files that
aren't there when trying to do imports. How can I reduce this?



I've long been frustrated with the time it takes my django installation to
run tests. Now, I'm aware that some ways to think about to how to write
fast and efficient tests in
django<http://www.slideshare.net/cordiskinsey/djangocon-2013-how-to-write-fast-and-efficient-unit-tests-in-django>
 include:
- putting things in setUpClass rather than setUp
- treating the database as hot lava (at least, when you don't have
ManyToMany relations)
- using sqlite3 in-memory
- using mock to emulate instances
- using nose to only run the tests you need
But I want to talk about another component of test performance: Disk I/O
syscalls

My company (the startup accelerator MassChallenge) uses
buildout<http://www.buildout.org/en/latest/> to
manage dependencies and the several apps modules within our appication. So
I run tests from the root of the buildout by calling `bin/django test`, a
script that looks like the following: (note the datetime printing)
```
#!/home/afarrell/projects/masschallenge/venv/bin/python
from datetime import datetime
print datetime.now().isoformat()
print "before test"

import sys
sys.path[0:0] = [
  
'/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg',
  ... 38 other paths to various eggs...
  '/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages',
  '/home/afarrell/projects/masschallenge/mc2013', #the root of our app
  ]

import djangorecipe.manage

if __name__ == '__main__':
    sys.exit(djangorecipe.manage.main('mcproject.develop_settings'))
```

I am in the habit of writing a single test as I am writing a method. To do
this, I use nose's attrib
tag<http://nose.readthedocs.org/en/latest/plugins/attrib.html> in
a test file that looks like this:
```
from nose.plugins.attrib import attr
import nose.tools as nt
from django.test import TestCase
from datetime import datetime

class TestExample(Testcase):
    @attr('now')
    def test_trivial_case(self):
        print datetime.now().isoformat()
        print "test has begun"
        nt.assert_equal(1 + 1, 2)
```

I then run tests with `bin/django test mentor_match -a'now' -s` and this
prints out the following:
```
2014-03-05T13:46:02.774142
before test
nosetests --verbosity 1 mentor_match -anow -s
To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1.
Creating test database for alias 'default'...
2014-03-05T13:46:28.612895
test has begun
```
As you can see, there are about 26 seconds from the test command is issued
and when the first test actually runs. This delay seems small but it is
noticeable and kinda annoying when that is the only test. You could load,
present, and fire a musket in that time. There are two possible causes that
I can think of:
1) The test database takes a long time to set up
2) The test command takes a long time to import all required modules

However the if I `export REUSE_DB=1` and then run `bin/django test
mentor_match -a'now' -s`, I see
2014-03-05T15:22:51.743338
before test
nosetests --verbosity 1 mc_allocate -s -anow
2014-03-05T15:23:15.813821
test has begun
which is still a difference of 24 seconds. Working on a linode, I often get
notifications about the high disk i/o rate, so it makes sense that the
cause is the latter. So, I turn to the sysadmin's microsope: strace.

With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s
-a'now' 2> tracelog`. I can now see all of the system calls made by the
command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of
those are open commands.

total:   67626
open:    37635  56.7%
stat64:  15401  22.8%
read:     5362   7.9%
fstat64:  2619   3.9%
write:    2316   3.4%
poll:     2268   3.4%
close:    1754   2.6%
mmap2:     949   1.4%
munmap:    847   1.3%

Since there are far more open calls than there are read calls, and
there are 45847 lines containing "ENOENT (No such file or directory)"
(86.4% of the open or stat64 calls), I think that django spends a lot of
time looking for files in the wrong location.


I plan to write a blog post about this, but before I dig in, does anyone
have any comments on the following approaches I'm considering:
- reordering the paths assigned to sys.path
- removing items from sys.path when I am running tests
- caching the locations where modules were previously found and looking
there first

Has anyone looked at doing anything like this?
-- 
*Andrew Farrell*
Software Engineer *|* MassChallenge, Inc.
1.888.782.7820x720 *|* amfarr...@masschallenge.org
ONE Marina Park Drive *|* Boston, MA 02210


* Free m entorship, resources, coworking space. No equity taken.
Applications now open: 2014 MassChallenge accelerator
<http://masschallenge.org/apply> Apply or refer great startups today! *

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CA%2By5TLYX8XJyb%2Ba6hPygs4p%3DLBNz-n%2BO1RuaDGqTtVVWdxi5Xw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to