Hi Chris:
I used -nodepara 2 to fully occupy the two CPUs on my system with a
paratest.server run, and it terminated OK. I did not set up the experiment so
that a test timed out on one of the workers, nor did I investigate what happens
when paratest.client hangs or crashes. So my experiment just represents the
"happy path". However, it would appear that paratest.server is basically
functioning correctly, even when the available CPUs are fully utilized.
Another thing I did not check was whether paratest is trying to read the output
log file before the worker has had a chance to open it. Perhaps the server
script assumes that the workers are reasonably responsive in this way. If they
are not, that might explain the grep: Logs/ ... messages. I have usually
interpreted those messages to mean that the corresponding worker died.
In a separate test, I just check to see if the perl "sleep" function does
something lame like busy-waiting without yielding. At least on my SLES system,
it does not busy-wait. So the assumption that paratest.server and the workers
can make progress in parallel is upheld on this platform.
I'm inclined to think that paratest.server is working as intended. It would
seem, rather, that paratest.client is not handling all error cases correctly.
I would look there for the fault. (But since my mods were only on the server
side, I have so far avoided doing so.)
In order to make further progress on your problem, I think I would need to
duplicate it. If you wish, you can send me a patch, and I will give it a try
on my system.
THH
________________________________
From: Tom Hildebrandt [[email protected]]
Sent: Saturday, April 05, 2014 2:50 PM
To: Chris Wailes
Cc: [email protected]
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl
Hi Chris:
I did draw attention to my change that removed the "wait;" statement from the
loop in paratest.server that waits for all child processes to complete. That,
combined with your observation that you are unable to create more threads
points at the problem: there are not enough physical threads to go around; at
least one of them is dying of starvation. There ought to be more than enough
threads to go around, so perhaps the problem also involves mismatched
priorities.
As it stands, paratest.server expects there to be at least w+1 threads
available (for w workers and paratest.server iteself) and for scheduling among
those threads to be reasonably fair.
I have assumed all along that the call to sleep() in that wait loop yields to
waiting threads. If not, then we might need to find a different way to pass
the time between checking for updates. I have not examined the paratest.client
script to see if there are potential gotchas there.
I'll play with this a bit, and see if I can duplicate your problem on my
workstation.
THH
________________________________
From: Chris Wailes [[email protected]]
Sent: Friday, April 04, 2014 3:43 PM
To: Brad Chamberlain
Cc: Tom Hildebrandt; Lydia Duncan; [email protected]
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl
After bisecting the commit log I found that commit 22715 is responsible for
this issue. Oddly, before the script actually exits my system becomes unable
to create new threads and grep says it is unable to find log files.
- Chris
On Mon, Mar 31, 2014 at 7:03 PM, Brad Chamberlain
<[email protected]<mailto:[email protected]>> wrote:
I don't have any insights, but will note that in our use cases, we tend not to
use paratest to oversubscribe testing on a single machine; rather we farm out
across multiple machines; so there may be some race/conflict which only shows
up in that situation?
Assuming any issue is in the paratest servers themselves, it shouldn't take you
long to do the binary search -- I think there have only been five changes to it
since Jan.
-Brad
On Mon, 31 Mar 2014, Tom Hildebrandt wrote:
Hi Chris:
The other change that I made in paratest.server was to remove the "wait"
command on line 172 or thereabouts, so the timeout time is updated each
second. I can't really see how this would cause the error messages you're
seeing. On the other hand, I have never tested by forking a number of
children equal to the number of processors available. I'll give that a try
(most likely this evening).
Tom H.
_____________________________________________________________________________
From: Chris Wailes [[email protected]<mailto:[email protected]>]
Sent: Monday, March 31, 2014 3:34 PM
To: Tom Hildebrandt
Cc: Brad Chamberlain; Lydia Duncan;
[email protected]<mailto:[email protected]>
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl
I've been playing with this for a couple of days now, and even with skipif
files for what I thought were the offending directories I end up getting the
following output (https://gist.github.com/chriswailes/a1b0c4d8df4eb983607c)
before the paratest.server script fails. Running start_test works just fine,
but if I try to run even 4 tests at once on my quad-core, hyperthreaded
machine, I get these error messages.
I haven't been as diligent with my rebasing as I should have been, so the
last time I know the mainline's version of the scripts worked was on January
29th. Does anyone know what might have changed since then to have caused
this problem? Before I was able to run 10 tests at a time on this same
machine. I'm about to head home now, but tomorrow I'll run a binary search
on the commit history to try and pin down the commit that caused this to stop
working.
- Chris
On Fri, Mar 28, 2014 at 12:32 PM, Tom Hildebrandt
<[email protected]<mailto:[email protected]>> wrote:
That is correct.
Note also that the .skipif file the skips a directory and it
descendents is a sibling of the directory to be skipped, whereas
the directory-wide SKIPIF file resides within the directory it
affects. Compare
test/chpldoc <-- Skip testing here and in all
descendents
test/chpldoc.skipif <-- if this script tests true.
vs.
test/distributions/deitz/SKIPIF <-- Skip testing in the
containing directory (only) if this script tests true.
THH
________________________________________
From: Brad Chamberlain [[email protected]<mailto:[email protected]>]
Sent: Friday, March 28, 2014 6:37 AM
To: Lydia Duncan;
[email protected]<mailto:[email protected]>
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl
IIRC, a difference between the two approaches is that putting it
in
the parent skips all recursive traversal below that directory as
well,
whereas putting it within the directory just skips that
directory, but
not its children?
-Brad
________________________________________
From: Lydia Duncan [[email protected]<mailto:[email protected]>]
Sent: Thursday, March 27, 2014 2:57 PM
To:
[email protected]<mailto:[email protected]>
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl
On 03/27/2014 02:53 PM, Chris Wailes wrote:
> Do skipif files work for directories?
Yup! You can either make a SKIPIF within the directory, or make
a
<dirname>.skipif file in its parent directory.
Lydia
----------------------------------------------------------------------------
--
_______________________________________________
Chapel-developers mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/chapel-developers
----------------------------------------------------------------------------
--
_______________________________________________
Chapel-developers mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/chapel-developers
----------------------------------------------------------------------------
--
_______________________________________________
Chapel-developers mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/chapel-developers
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers