Re: Tests locking up with 100% CPU usage

Igor Bondarenko Tue, 30 Sep 2014 04:50:04 -0700

We're currently using `./run_tests -n 2 -m 4` on a single core machines and
that works fine for us


On Mon, Sep 29, 2014 at 5:44 PM, Dave Brondsema <[email protected]> wrote:

> Seems like there's no easy fix for this :(
>
> Since the workaround we're adopting is to run the Allura package's tests
> with
> nosetests --processes=2 (or more) we should probably force ./run_tests to
> do
> that so it doesn't cause problems for somebody trying out Allura on a
> single
> core machine or VM.  Any downside to that?
>
> On 9/25/14 12:41 AM, Alex Luberg wrote:
> > I have discovered that the suite passed with 756 tests, and if I added
> > another test(just copied some existing one with a different name) it
> locked
> > up at some test(which was not the one i've copied). I suspect that it is
> > not related to the actual test code, but something with
> nose/python/sandbox.
> >
> > On Mon, Sep 22, 2014 at 3:40 AM, Igor Bondarenko <[email protected]>
> wrote:
> >
> >> On Sat, Sep 20, 2014 at 12:25 AM, Dave Brondsema <[email protected]>
> >> wrote:
> >>
> >>> On 9/19/14 12:18 PM, Dave Brondsema wrote:
> >>>> Starting with Igor's comments on
> >>> https://sourceforge.net/p/allura/tickets/7657/#c7d9
> >>>>
> >>>>> There's a couple of new tests commented out in a last commit. I can't
> >>> figure out why, but they cause allura/tests/test_dispatch.py to hang
> when
> >>> run together with other tests. Also I have added and then removed tests
> >> for
> >>> enable/disable user for the same reason.
> >>>>>
> >>>>> I think it needs another pair of eyes on it, since I've already spent
> >>> too much time dealing with this tests and have no idea what's
> >> happening...
> >>> Maybe I'm missing something obvious.
> >>>>
> >>>> Alex and I have seen this recently too, and its hard to figure out
> what
> >>> exactly
> >>>> is the problem.  I first noticed it when running `./run_tests
> >>> --with-coverage`
> >>>> which would run nosetests in the Allura dir and would not use
> >>> --processes=N
> >>>> because of the with-coverage param.  So basically just a regular run
> of
> >>> the
> >>>> tests in the Allura dir would cause the CPU to go into 100% usage and
> >>> the tests
> >>>> wouldn't finish.  Couldn't ctrl-C or profile them, had to kill -9 it.
> >>>>
> >>>> That was on Centos 5.10 and a workaround was to run with --processes=N
> >>> and then
> >>>> the tests would finish fine.  On the Ubuntu vagrant image, I didn't
> >>> encounter
> >>>> any problem in the first place.  So perhaps related to the
> environment.
> >>>>
> >>>> I tried to narrow down to a specific test that might be the culprit.
> I
> >>> found
> >>>> tests consistently got up to TestSecurity.test_auth (which is a bit
> >>> weird and
> >>>> old test anyway).  And also that commenting out that test let them all
> >>> pass.
> >>>>
> >>>> But I'm pretty sure Alex said he dug into this as well and found
> >>> variation in
> >>>> what tests could cause the problem.  I think he told me that going
> back
> >>> in
> >>>> git-history before the problem, and then adding a single test (a copy
> >> of
> >>> an
> >>>> existing one) caused the problem.  So perhaps some limit, or resource
> >>> tipping
> >>>> point is hit.
> >>>>
> >>>> Alex or Igor, any more data points you know from what you've seen?
> >>>>
> >>>> Anyone else seen anything like this?  Or have ideas for how to
> approach
> >>> nailing
> >>>> it down better?
> >>>>
> >>>>
> >>>
> >>> I tried checking out branch je/42cc_7657 and going back to commit
> >>> 4cc63586e5728d7d0c5c2f09150eb07eb7e4edc1 (before tests were commented
> >> out)
> >>> to
> >>> see what happened for me:
> >>>
> >>> On vagrant / ubuntu, it froze at test_dispatch.py same as you.  So some
> >>> consistency there.  Tests passed when I ran `nosetests
> >>> --process-timeout=180
> >>> --processes=4 -v` in the Allura dir.  Seemed slow at the end though,
> >> almost
> >>> thought it froze.
> >>>
> >>> On centos, it froze at a different spot with a regular nosetests run.
> It
> >>> passed
> >>> with `nosetests allura/tests/ --processes=4 --process-timeout=180 -v`.
> >>> For some
> >>> reason (hopefully unrelated), I needed to specify path "allura/tests/"
> to
> >>> avoid
> >>> an IOError from multiprocessing.
> >>>
> >>> So at least multiprocess tests still seems like a workaround for me.
> >> Note:
> >>> ./run_tests picks a --processes=N value dynamically based on the
> >> machine's
> >>> CPU
> >>> cores, so with a single core you don't get multiple processes that way.
> >>> Also
> >>> note: if you have nose-progressive installed and active, that is
> >>> incompatible
> >>> with multiple processes.
> >>>
> >>>
> >> It works exactly as you described for me too.
> >>
> >> I've reverted some commits with those tests, since problem not with them
> >> and they are useful https://sourceforge.net/p/allura/tickets/7657/#8c06
> >> and
> >> also made a fix in 42cc's Makefile (commited directly in master), so
> that
> >> it would always run tests in parallel (turns out here at 42cc we have
> >> single core CPUs on boxes that run tests, that's why I had locks on our
> CI
> >> also :( )
> >>
> >
>
>
>
> --
> Dave Brondsema : [email protected]
> http://www.brondsema.net : personal
> http://www.splike.com : programming
>               <><
>

Re: Tests locking up with 100% CPU usage

Reply via email to