On Thu, Mar 6, 2014 at 4:52 AM, Andrew Farrell <[email protected]>wrote:
> Привет Djanonaughts! > > tl;dr: django with buildout spends a lot of time looking for files that > aren't there when trying to do imports. How can I reduce this? > > > > I've long been frustrated with the time it takes my django installation to > run tests. Now, I'm aware that some ways to think about to how to write > fast and efficient tests in > django<http://www.slideshare.net/cordiskinsey/djangocon-2013-how-to-write-fast-and-efficient-unit-tests-in-django> > include: > - putting things in setUpClass rather than setUp > - treating the database as hot lava (at least, when you don't have > ManyToMany relations) > - using sqlite3 in-memory > - using mock to emulate instances > - using nose to only run the tests you need > But I want to talk about another component of test performance: Disk I/O > syscalls > > My company (the startup accelerator MassChallenge) uses > buildout<http://www.buildout.org/en/latest/> to > manage dependencies and the several apps modules within our appication. So > I run tests from the root of the buildout by calling `bin/django test`, a > script that looks like the following: (note the datetime printing) > ``` > #!/home/afarrell/projects/masschallenge/venv/bin/python > from datetime import datetime > print datetime.now().isoformat() > print "before test" > > import sys > sys.path[0:0] = [ > > > '/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg', > ... 38 other paths to various eggs... > '/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages', > '/home/afarrell/projects/masschallenge/mc2013', #the root of our app > ] > > import djangorecipe.manage > > if __name__ == '__main__': > sys.exit(djangorecipe.manage.main('mcproject.develop_settings')) > ``` > > I am in the habit of writing a single test as I am writing a method. To do > this, I use nose's attrib > tag<http://nose.readthedocs.org/en/latest/plugins/attrib.html> in > a test file that looks like this: > ``` > from nose.plugins.attrib import attr > import nose.tools as nt > from django.test import TestCase > from datetime import datetime > > class TestExample(Testcase): > @attr('now') > def test_trivial_case(self): > print datetime.now().isoformat() > print "test has begun" > nt.assert_equal(1 + 1, 2) > ``` > > I then run tests with `bin/django test mentor_match -a'now' -s` and this > prints out the following: > ``` > 2014-03-05T13:46:02.774142 > before test > nosetests --verbosity 1 mentor_match -anow -s > To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1. > Creating test database for alias 'default'... > 2014-03-05T13:46:28.612895 > test has begun > ``` > As you can see, there are about 26 seconds from the test command is issued > and when the first test actually runs. This delay seems small but it is > noticeable and kinda annoying when that is the only test. You could load, > present, and fire a musket in that time. There are two possible causes that > I can think of: > 1) The test database takes a long time to set up > 2) The test command takes a long time to import all required modules > > However the if I `export REUSE_DB=1` and then run `bin/django test > mentor_match -a'now' -s`, I see > 2014-03-05T15:22:51.743338 > before test > nosetests --verbosity 1 mc_allocate -s -anow > 2014-03-05T15:23:15.813821 > test has begun > which is still a difference of 24 seconds. Working on a linode, I often > get notifications about the high disk i/o rate, so it makes sense that the > cause is the latter. So, I turn to the sysadmin's microsope: strace. > > With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s > -a'now' 2> tracelog`. I can now see all of the system calls made by the > command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of > those are open commands. > > total: 67626 > open: 37635 56.7% > stat64: 15401 22.8% > read: 5362 7.9% > fstat64: 2619 3.9% > write: 2316 3.4% > poll: 2268 3.4% > close: 1754 2.6% > mmap2: 949 1.4% > munmap: 847 1.3% > > Since there are far more open calls than there are read calls, and > there are 45847 lines containing "ENOENT (No such file or directory)" > (86.4% of the open or stat64 calls), I think that django spends a lot of > time looking for files in the wrong location. > > > I plan to write a blog post about this, but before I dig in, does anyone > have any comments on the following approaches I'm considering: > - reordering the paths assigned to sys.path > - removing items from sys.path when I am running tests > - caching the locations where modules were previously found and looking > there first > > Has anyone looked at doing anything like this? > Not to my knowledge, but based on your data, it certainly looks like it would be worthwhile doing so. I have to say that the data you've presented here surprises me. I can't think of any obvious reasons why Django would be spending that much time trying to open non-existent files - at least, not to the extent that you would observe 24 seconds of startup time - unless your disk performance was *really* bad. However, I know from experience isn't generally a problem on Linode. The only "speculative opening" behaviour I can think of would be the attempted imports that are performed as part of identifying INSTALLED_APPS and management commands - that is, looking for each models.py in your INSTALLED_APPS, and looking for the management/commands folder in each installed app. However, this should be a fairly constrained activity; startup will only look in each app once, for a single folder that (in most cases) won't exist. We're talking O(100) operations in the worst case for a Django project - not the sort of thing I would have expected to cause 24 seconds of slowdown. So - I'm very much interested in what you discover. Good hunting! (I understand your weapon of choice is a musket, so you'll need the luck :-) Also - you might want to bump this thread over to django-developers; you're more likely to find a receptive audience for this sort of analysis. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJxq84-bk5c6b01nK8%2BmCfNPLw2ZYVX0y%2BSo6svPHcwrWfxKEQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.

