On Fri, May 13, 2011 at 6:57 PM, Erik Rose <[email protected]> wrote: > tl;dr: I've written an alternative TestCase base class which makes > fixture-using tests much more I/O efficient on transactional DBs, and I'd > like to upstream it. > > Greetings, all! This is my first django-dev post, so please be gentle. :-) > I hack on support.mozilla.com, a fairly large Django site with about 1000 > tests. Those tests make heavy use of fixtures and, as a result, used to take > over 5 minutes to run. So, I spent a few days seeing if I could cut the > amount of DB I/O needed. Ultimately, I got the run down to just over 1 > minute, and almost all of those gains are translatable to any Django site > running against a transactional DB. No changes to the apps themselves are > needed. I'd love to push some of this work upstream, if there's interest (or > even lack of opposition ;-)). > > The speedups came from 3 main optimizations: > > 1. Class-level fixture setup > > Given a transaction DB, there's no reason to reload fixtures via dozens of > SQL statements before every test. I made use of setup_class() and > teardown_class() (yay, unittest2!) to change the flow for TestCase-using > tests to this: > a. Load the fixtures at the top of the class, and commit. > b. Run a test. > c. Roll back, returning to pristine fixtures. Go back to step b. > d. At class teardown, figure out which tables the fixtures loaded into, > and expressly clear out what was added. > > Before this optimization: 302s to run the suite > After: 97s. > > Before: 37,583 queries > After: 4,116 > > On top of that, an additional 4s was saved by reusing a single connection > rather than opening and closing them all the time, bringing the final number > down to 93s. (We can get away with this because we're committing any > on-cursor-initialization setup, whereas the old TestCase rolled it back.) > > Here's the code: > https://github.com/erikrose/test-utils/blob/master/test_utils/__init__.py#L121. > I'd love to generalize it a bit (to fall back to the old behavior with > non-transactional backends, for example) and offer it as a patch to Django > proper, replacing TestCase. Thoughts? > > (If you notice that copy-and-paste of loaddata sitting off to the side in > another module, don't fret; in the patch, that would turn into a refactoring > of loaddata to make the computation of the fixture-referenced tables > separately reusable.) > > This is the one I'm most interested. I did a patch a number of months ago to do the fixture parsing, but not DB insertion on a per-class basis. I didn't find that to be a big win. However, I'm going to be working on a patch to do bulk inserts (that is a single execute/executemany call for all objects to be inserted), which could be a big win for fixture loading, so I'd kind of like to do that first, to see how big a win this is after that. This is obviously more specialized, and invasive (IMO), so if we can get most of the win without it that might be good enough.
> > 2. Fixture grouping > > I next observed that many test classes reused the same sets of fixtures, > often via subclassing. After the previous optimization, our tests still > loaded fixtures 114 times, even though there were only 11 distinct sets of > them. So, I thought: why not write a custom testrunner that buckets the > classes by fixture set and advises the classes that, unless they're the > first or last in a bucket, they shouldn't bother tearing down or setting up > the fixtures, respectively? This took the form of a custom nose plugin (we > use nose for all our Django stuff), and it took another quarter off the test > run: > > Before: 97s > After: 74s > > Of course, test independence is still preserved. We're just factoring out > pointlessly repeated setup. > > I don't really have plans to upstream this unless someone calls for it, but > I'll be making it available soon, likely as part of django-nose. > > No particular thoughts at the moment. > 3. Startup optimizations > > At this point, it was bothering me that, just to run a single test, I had > to wait through 15s of DB initialization (mostly auth_permissions and > django_content_type population)—stuff which was already perfectly valid from > the previous test run. So, building on some work we had already done in this > direction, I decided to skip the teardown of the test DB and, symmetrically, > the setup on future runs. If you make schema changes, just set an env var, > and it wipes and remakes the DB like usual. I could see pushing this into > django-nose as well, but it's got the hackiest implementation and can > potentially confuse users. I mention it for completeness. > > Before: startup time 15s > After: 3s (There's quite a wide variance due to I/O caching luck.) > > Code: https://github.com/erikrose/test-utils/commit/b95a1b7 > > This is another thing that I think we can get most of the win from doing the bulk inserts. Given this looks rather specialized (and if you have other custom syncdb hooks does it break?), I'd like to avoid it if possible. > If you read this far, you get a cookie! I welcome your feedback on merging > optimization #1 into core, as well as any accusations of insanity re: #2 and > #3. FWIW, everything works great without touching any of the tests on 3 of > our Django sites, totaling over 2000 tests. > > Best regards and wishes for a happy weekend, > Erik Rose > support.mozilla.com > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > Speeding up tests is defintely of interest to me, so thanks for the great work! Thanks! -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
