Hi Chris:

I did draw attention to my change that removed the "wait;" statement from the 
loop in paratest.server that waits for all child processes to complete.  That, 
combined with your observation that you are unable to create more threads 
points at the problem: there are not enough physical threads to go around; at 
least one of them is dying of starvation.  There ought to be more than enough 
threads to go around, so perhaps the problem also involves mismatched 
priorities.

As it stands, paratest.server expects there to be at least w+1 threads 
available (for w workers and paratest.server iteself) and for scheduling among 
those threads to be reasonably fair.

I have assumed all along that the call to sleep() in that wait loop yields to 
waiting threads.  If not, then we might need to find a different way to pass 
the time between checking for updates.  I have not examined the paratest.client 
script to see if there are potential gotchas there.

I'll play with this a bit, and see if I can duplicate your problem on my 
workstation.

THH
________________________________
From: Chris Wailes [[email protected]]
Sent: Friday, April 04, 2014 3:43 PM
To: Brad Chamberlain
Cc: Tom Hildebrandt; Lydia Duncan; [email protected]
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl

After bisecting the commit log I found that commit 22715 is responsible for 
this issue.  Oddly, before the script actually exits my system becomes unable 
to create new threads and grep says it is unable to find log files.

- Chris


On Mon, Mar 31, 2014 at 7:03 PM, Brad Chamberlain 
<[email protected]<mailto:[email protected]>> wrote:

I don't have any insights, but will note that in our use cases, we tend not to 
use paratest to oversubscribe testing on a single machine; rather we farm out 
across multiple machines; so there may be some race/conflict which only shows 
up in that situation?

Assuming any issue is in the paratest servers themselves, it shouldn't take you 
long to do the binary search -- I think there have only been five changes to it 
since Jan.

-Brad



On Mon, 31 Mar 2014, Tom Hildebrandt wrote:

Hi Chris:
The other change that I made in paratest.server was to remove the "wait"
command on line 172 or thereabouts, so the timeout time is updated each
second.  I can't really see how this would cause the error messages you're
seeing.  On the other hand, I have never tested by forking a number of
children equal to the number of processors available. I'll give that a try
(most likely this evening).

Tom H.

_____________________________________________________________________________

From: Chris Wailes [[email protected]<mailto:[email protected]>]
Sent: Monday, March 31, 2014 3:34 PM
To: Tom Hildebrandt
Cc: Brad Chamberlain; Lydia Duncan; 
[email protected]<mailto:[email protected]>
Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl

I've been playing with this for a couple of days now, and even with skipif
files for what I thought were the offending directories I end up getting the
following output (https://gist.github.com/chriswailes/a1b0c4d8df4eb983607c)
before the paratest.server script fails.  Running start_test works just fine,
but if I try to run even 4 tests at once on my quad-core, hyperthreaded
machine, I get these error messages.

I haven't been as diligent with my rebasing as I should have been, so the
last time I know the mainline's version of the scripts worked was on January
29th.  Does anyone know what might have changed since then to have caused
this problem?  Before I was able to run 10 tests at a time on this same
machine.  I'm about to head home now, but tomorrow I'll run a binary search
on the commit history to try and pin down the commit that caused this to stop
working.

- Chris


On Fri, Mar 28, 2014 at 12:32 PM, Tom Hildebrandt 
<[email protected]<mailto:[email protected]>> wrote:
      That is correct.
      Note also that the .skipif file the skips a directory and it
      descendents is a sibling of the directory to be skipped, whereas
      the directory-wide SKIPIF file resides within the directory it
      affects.  Compare
        test/chpldoc          <-- Skip testing here and in all
      descendents
        test/chpldoc.skipif  <-- if this script tests true.
      vs.
        test/distributions/deitz/SKIPIF <-- Skip testing in the
      containing directory (only) if this script tests true.

      THH
      ________________________________________
      From: Brad Chamberlain [[email protected]<mailto:[email protected]>]
      Sent: Friday, March 28, 2014 6:37 AM
      To: Lydia Duncan; 
[email protected]<mailto:[email protected]>
      Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl

      IIRC, a difference between the two approaches is that putting it
      in
      the parent skips all recursive traversal below that directory as
      well,
      whereas putting it within the directory just skips that
      directory, but
      not its children?

      -Brad

      ________________________________________
      From: Lydia Duncan [[email protected]<mailto:[email protected]>]
      Sent: Thursday, March 27, 2014 2:57 PM
      To: 
[email protected]<mailto:[email protected]>
      Subject: Re: [Chapel-developers] Paratest and TooManyThreads.chpl

      On 03/27/2014 02:53 PM, Chris Wailes wrote:
      > Do skipif files work for directories?
      Yup!  You can either make a SKIPIF within the directory, or make
      a
      <dirname>.skipif file in its parent directory.

      Lydia

----------------------------------------------------------------------------
      --
      _______________________________________________
      Chapel-developers mailing list
      
[email protected]<mailto:[email protected]>
      https://lists.sourceforge.net/lists/listinfo/chapel-developers

----------------------------------------------------------------------------
      --
      _______________________________________________
      Chapel-developers mailing list
      
[email protected]<mailto:[email protected]>
      https://lists.sourceforge.net/lists/listinfo/chapel-developers

----------------------------------------------------------------------------
      --
      _______________________________________________
      Chapel-developers mailing list
      
[email protected]<mailto:[email protected]>
      https://lists.sourceforge.net/lists/listinfo/chapel-developers




------------------------------------------------------------------------------
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to