On Thu, Mar 6, 2014 at 4:52 AM, Andrew Farrell <[email protected]>wrote:

> Привет Djanonaughts!
>
> tl;dr: django with buildout spends a lot of time looking for files that
> aren't there when trying to do imports. How can I reduce this?
>
>
>
> I've long been frustrated with the time it takes my django installation to
> run tests. Now, I'm aware that some ways to think about to how to write
> fast and efficient tests in 
> django<http://www.slideshare.net/cordiskinsey/djangocon-2013-how-to-write-fast-and-efficient-unit-tests-in-django>
>  include:
> - putting things in setUpClass rather than setUp
> - treating the database as hot lava (at least, when you don't have
> ManyToMany relations)
> - using sqlite3 in-memory
> - using mock to emulate instances
> - using nose to only run the tests you need
> But I want to talk about another component of test performance: Disk I/O
> syscalls
>
> My company (the startup accelerator MassChallenge) uses 
> buildout<http://www.buildout.org/en/latest/> to
> manage dependencies and the several apps modules within our appication. So
> I run tests from the root of the buildout by calling `bin/django test`, a
> script that looks like the following: (note the datetime printing)
> ```
> #!/home/afarrell/projects/masschallenge/venv/bin/python
> from datetime import datetime
> print datetime.now().isoformat()
> print "before test"
>
> import sys
> sys.path[0:0] = [
>
>   
> '/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg',
>   ... 38 other paths to various eggs...
>   '/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages',
>   '/home/afarrell/projects/masschallenge/mc2013', #the root of our app
>   ]
>
> import djangorecipe.manage
>
> if __name__ == '__main__':
>     sys.exit(djangorecipe.manage.main('mcproject.develop_settings'))
> ```
>
> I am in the habit of writing a single test as I am writing a method. To do
> this, I use nose's attrib 
> tag<http://nose.readthedocs.org/en/latest/plugins/attrib.html> in
> a test file that looks like this:
> ```
> from nose.plugins.attrib import attr
> import nose.tools as nt
> from django.test import TestCase
> from datetime import datetime
>
> class TestExample(Testcase):
>     @attr('now')
>     def test_trivial_case(self):
>         print datetime.now().isoformat()
>         print "test has begun"
>         nt.assert_equal(1 + 1, 2)
> ```
>
> I then run tests with `bin/django test mentor_match -a'now' -s` and this
> prints out the following:
> ```
> 2014-03-05T13:46:02.774142
> before test
> nosetests --verbosity 1 mentor_match -anow -s
> To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1.
> Creating test database for alias 'default'...
> 2014-03-05T13:46:28.612895
> test has begun
> ```
> As you can see, there are about 26 seconds from the test command is issued
> and when the first test actually runs. This delay seems small but it is
> noticeable and kinda annoying when that is the only test. You could load,
> present, and fire a musket in that time. There are two possible causes that
> I can think of:
> 1) The test database takes a long time to set up
> 2) The test command takes a long time to import all required modules
>
> However the if I `export REUSE_DB=1` and then run `bin/django test
> mentor_match -a'now' -s`, I see
> 2014-03-05T15:22:51.743338
> before test
> nosetests --verbosity 1 mc_allocate -s -anow
> 2014-03-05T15:23:15.813821
> test has begun
> which is still a difference of 24 seconds. Working on a linode, I often
> get notifications about the high disk i/o rate, so it makes sense that the
> cause is the latter. So, I turn to the sysadmin's microsope: strace.
>
> With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s
> -a'now' 2> tracelog`. I can now see all of the system calls made by the
> command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of
> those are open commands.
>
> total:   67626
> open:    37635  56.7%
> stat64:  15401  22.8%
> read:     5362   7.9%
> fstat64:  2619   3.9%
> write:    2316   3.4%
> poll:     2268   3.4%
> close:    1754   2.6%
> mmap2:     949   1.4%
> munmap:    847   1.3%
>
> Since there are far more open calls than there are read calls, and
> there are 45847 lines containing "ENOENT (No such file or directory)"
> (86.4% of the open or stat64 calls), I think that django spends a lot of
> time looking for files in the wrong location.
>
>
> I plan to write a blog post about this, but before I dig in, does anyone
> have any comments on the following approaches I'm considering:
> - reordering the paths assigned to sys.path
> - removing items from sys.path when I am running tests
> - caching the locations where modules were previously found and looking
> there first
>
> Has anyone looked at doing anything like this?
>

Not to my knowledge, but based on your data, it certainly looks like it
would be worthwhile doing so.

I have to say that the data you've presented here surprises me. I can't
think of any obvious reasons why Django would be spending that much time
trying to open non-existent files - at least, not to the extent that you
would observe 24 seconds of startup time - unless your disk performance was
*really* bad. However, I know from experience isn't generally a problem on
Linode.

The only "speculative opening" behaviour I can think of would be the
attempted imports that are performed as part of identifying INSTALLED_APPS
and management commands - that is, looking for each models.py in your
INSTALLED_APPS, and looking for the management/commands folder in each
installed app. However, this should be a fairly constrained activity;
startup will only look in each app once, for a single folder that (in most
cases) won't exist. We're talking O(100) operations in the worst case for a
Django project - not the sort of thing I would have expected to cause 24
seconds of slowdown.

So - I'm very much interested in what you discover. Good hunting! (I
understand your weapon of choice is a musket, so you'll need the luck :-)

Also - you might want to bump this thread over to django-developers; you're
more likely to find a receptive audience for this sort of analysis.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAJxq84-bk5c6b01nK8%2BmCfNPLw2ZYVX0y%2BSo6svPHcwrWfxKEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to