Привет Djanonaughts! tl;dr: django with buildout spends a lot of time looking for files that aren't there when trying to do imports. How can I reduce this?
I've long been frustrated with the time it takes my django installation to run tests. Now, I'm aware that some ways to think about to how to write fast and efficient tests in django<http://www.slideshare.net/cordiskinsey/djangocon-2013-how-to-write-fast-and-efficient-unit-tests-in-django> include: - putting things in setUpClass rather than setUp - treating the database as hot lava (at least, when you don't have ManyToMany relations) - using sqlite3 in-memory - using mock to emulate instances - using nose to only run the tests you need But I want to talk about another component of test performance: Disk I/O syscalls My company (the startup accelerator MassChallenge) uses buildout<http://www.buildout.org/en/latest/> to manage dependencies and the several apps modules within our appication. So I run tests from the root of the buildout by calling `bin/django test`, a script that looks like the following: (note the datetime printing) ``` #!/home/afarrell/projects/masschallenge/venv/bin/python from datetime import datetime print datetime.now().isoformat() print "before test" import sys sys.path[0:0] = [ '/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg', ... 38 other paths to various eggs... '/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages', '/home/afarrell/projects/masschallenge/mc2013', #the root of our app ] import djangorecipe.manage if __name__ == '__main__': sys.exit(djangorecipe.manage.main('mcproject.develop_settings')) ``` I am in the habit of writing a single test as I am writing a method. To do this, I use nose's attrib tag<http://nose.readthedocs.org/en/latest/plugins/attrib.html> in a test file that looks like this: ``` from nose.plugins.attrib import attr import nose.tools as nt from django.test import TestCase from datetime import datetime class TestExample(Testcase): @attr('now') def test_trivial_case(self): print datetime.now().isoformat() print "test has begun" nt.assert_equal(1 + 1, 2) ``` I then run tests with `bin/django test mentor_match -a'now' -s` and this prints out the following: ``` 2014-03-05T13:46:02.774142 before test nosetests --verbosity 1 mentor_match -anow -s To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1. Creating test database for alias 'default'... 2014-03-05T13:46:28.612895 test has begun ``` As you can see, there are about 26 seconds from the test command is issued and when the first test actually runs. This delay seems small but it is noticeable and kinda annoying when that is the only test. You could load, present, and fire a musket in that time. There are two possible causes that I can think of: 1) The test database takes a long time to set up 2) The test command takes a long time to import all required modules However the if I `export REUSE_DB=1` and then run `bin/django test mentor_match -a'now' -s`, I see 2014-03-05T15:22:51.743338 before test nosetests --verbosity 1 mc_allocate -s -anow 2014-03-05T15:23:15.813821 test has begun which is still a difference of 24 seconds. Working on a linode, I often get notifications about the high disk i/o rate, so it makes sense that the cause is the latter. So, I turn to the sysadmin's microsope: strace. With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s -a'now' 2> tracelog`. I can now see all of the system calls made by the command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of those are open commands. total: 67626 open: 37635 56.7% stat64: 15401 22.8% read: 5362 7.9% fstat64: 2619 3.9% write: 2316 3.4% poll: 2268 3.4% close: 1754 2.6% mmap2: 949 1.4% munmap: 847 1.3% Since there are far more open calls than there are read calls, and there are 45847 lines containing "ENOENT (No such file or directory)" (86.4% of the open or stat64 calls), I think that django spends a lot of time looking for files in the wrong location. I plan to write a blog post about this, but before I dig in, does anyone have any comments on the following approaches I'm considering: - reordering the paths assigned to sys.path - removing items from sys.path when I am running tests - caching the locations where modules were previously found and looking there first Has anyone looked at doing anything like this? -- *Andrew Farrell* Software Engineer *|* MassChallenge, Inc. 1.888.782.7820x720 *|* amfarr...@masschallenge.org ONE Marina Park Drive *|* Boston, MA 02210 * Free m entorship, resources, coworking space. No equity taken. Applications now open: 2014 MassChallenge accelerator <http://masschallenge.org/apply> Apply or refer great startups today! * -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2By5TLYX8XJyb%2Ba6hPygs4p%3DLBNz-n%2BO1RuaDGqTtVVWdxi5Xw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.