Re: Tests locking up with 100% CPU usage

Dave Brondsema Mon, 29 Sep 2014 07:44:56 -0700

Seems like there's no easy fix for this :(

Since the workaround we're adopting is to run the Allura package's tests with
nosetests --processes=2 (or more) we should probably force ./run_tests to do
that so it doesn't cause problems for somebody trying out Allura on a single
core machine or VM.  Any downside to that?


On 9/25/14 12:41 AM, Alex Luberg wrote:
> I have discovered that the suite passed with 756 tests, and if I added
> another test(just copied some existing one with a different name) it locked
> up at some test(which was not the one i've copied). I suspect that it is
> not related to the actual test code, but something with nose/python/sandbox.
> 
> On Mon, Sep 22, 2014 at 3:40 AM, Igor Bondarenko <[email protected]> wrote:
> 
>> On Sat, Sep 20, 2014 at 12:25 AM, Dave Brondsema <[email protected]>
>> wrote:
>>
>>> On 9/19/14 12:18 PM, Dave Brondsema wrote:
>>>> Starting with Igor's comments on
>>> https://sourceforge.net/p/allura/tickets/7657/#c7d9
>>>>
>>>>> There's a couple of new tests commented out in a last commit. I can't
>>> figure out why, but they cause allura/tests/test_dispatch.py to hang when
>>> run together with other tests. Also I have added and then removed tests
>> for
>>> enable/disable user for the same reason.
>>>>>
>>>>> I think it needs another pair of eyes on it, since I've already spent
>>> too much time dealing with this tests and have no idea what's
>> happening...
>>> Maybe I'm missing something obvious.
>>>>
>>>> Alex and I have seen this recently too, and its hard to figure out what
>>> exactly
>>>> is the problem.  I first noticed it when running `./run_tests
>>> --with-coverage`
>>>> which would run nosetests in the Allura dir and would not use
>>> --processes=N
>>>> because of the with-coverage param.  So basically just a regular run of
>>> the
>>>> tests in the Allura dir would cause the CPU to go into 100% usage and
>>> the tests
>>>> wouldn't finish.  Couldn't ctrl-C or profile them, had to kill -9 it.
>>>>
>>>> That was on Centos 5.10 and a workaround was to run with --processes=N
>>> and then
>>>> the tests would finish fine.  On the Ubuntu vagrant image, I didn't
>>> encounter
>>>> any problem in the first place.  So perhaps related to the environment.
>>>>
>>>> I tried to narrow down to a specific test that might be the culprit.  I
>>> found
>>>> tests consistently got up to TestSecurity.test_auth (which is a bit
>>> weird and
>>>> old test anyway).  And also that commenting out that test let them all
>>> pass.
>>>>
>>>> But I'm pretty sure Alex said he dug into this as well and found
>>> variation in
>>>> what tests could cause the problem.  I think he told me that going back
>>> in
>>>> git-history before the problem, and then adding a single test (a copy
>> of
>>> an
>>>> existing one) caused the problem.  So perhaps some limit, or resource
>>> tipping
>>>> point is hit.
>>>>
>>>> Alex or Igor, any more data points you know from what you've seen?
>>>>
>>>> Anyone else seen anything like this?  Or have ideas for how to approach
>>> nailing
>>>> it down better?
>>>>
>>>>
>>>
>>> I tried checking out branch je/42cc_7657 and going back to commit
>>> 4cc63586e5728d7d0c5c2f09150eb07eb7e4edc1 (before tests were commented
>> out)
>>> to
>>> see what happened for me:
>>>
>>> On vagrant / ubuntu, it froze at test_dispatch.py same as you.  So some
>>> consistency there.  Tests passed when I ran `nosetests
>>> --process-timeout=180
>>> --processes=4 -v` in the Allura dir.  Seemed slow at the end though,
>> almost
>>> thought it froze.
>>>
>>> On centos, it froze at a different spot with a regular nosetests run.  It
>>> passed
>>> with `nosetests allura/tests/ --processes=4 --process-timeout=180 -v`.
>>> For some
>>> reason (hopefully unrelated), I needed to specify path "allura/tests/" to
>>> avoid
>>> an IOError from multiprocessing.
>>>
>>> So at least multiprocess tests still seems like a workaround for me.
>> Note:
>>> ./run_tests picks a --processes=N value dynamically based on the
>> machine's
>>> CPU
>>> cores, so with a single core you don't get multiple processes that way.
>>> Also
>>> note: if you have nose-progressive installed and active, that is
>>> incompatible
>>> with multiple processes.
>>>
>>>
>> It works exactly as you described for me too.
>>
>> I've reverted some commits with those tests, since problem not with them
>> and they are useful https://sourceforge.net/p/allura/tickets/7657/#8c06
>> and
>> also made a fix in 42cc's Makefile (commited directly in master), so that
>> it would always run tests in parallel (turns out here at 42cc we have
>> single core CPUs on boxes that run tests, that's why I had locks on our CI
>> also :( )
>>
> 



-- 
Dave Brondsema : [email protected]
http://www.brondsema.net : personal
http://www.splike.com : programming
              <><

Re: Tests locking up with 100% CPU usage

Reply via email to