Hi Brad, Soumen --
I don't know how much time might elapse between when the Chapel runtime
requests the creation of a system pthread (for Chapel task hosting) and
when that pthread becomes useful, but if it's long enough and the bodies
of the forall loops are small enough, it could be that the forall loops
are done before any new pthreads arrive to help. If so, then increasing
the iteration count from 2048 to a million or more might be enough to
ensure that actual parallelism occurs.
Assuming the default fifo tasking layer is being used, any system
pthreads created to host Chapel tasks will continue running, and looking
for work, after those tasks have completed. So if we create pthreads
for the tasks in the forall, those pthreads will continue to exist until
the program terminates. This would explain why the cores are busy after
the writeln().
Soumen, if you do
$CHPL_HOME/util/printchplenv
what is the CHPL_TASKS setting it reports?
greg
On Wed, 29 Jan 2014, Brad Chamberlain wrote:
Hi Soumen --
Unless I'm missing something (and I don't think I am), by default, the two
forall loops ought to use
four tasks/threads without any effort on your part (i.e., no special flags or
switches or anything).
The for loop and the writeln() would only use 1 task/thread. Depending on
what your functions are,
it may be that these are short-enough running threads that the OS doesn't move
them around to use all
four cores, but that would surprise me slightly (in particular, I'd think that
the threads would
spread out pretty quickly, if not be created in a spread-out manner).
That leads me to ask: What technique are you using to determine whether or not
four cores are being
used?
I'll note that we have some current work going on to better map specific tasks
to specific numa
domains within a node, a prototype of which was released in the 1.8.0 release,
but I don't think that
should be necessary to end up using all of your cores.
Thanks,
-Brad
_____________________________________________________________________________________________________
From: Soumen [[email protected]]
Sent: Wednesday, January 29, 2014 6:15 AM
To: [email protected]
Subject: forall loop not using all cores.
Hi,
I have chapel 1.8.0 installed with default settings in my desktop having config:
OS: Ubuntu 12.04.04
Ram: 4gb
Processor: i5 2320 @ 3.00 Ghz x 4
The code I am trying to run is:
var d : domain(1) int = {1..2048};
var A : [d] int;
forall i in d {
doSomethingOnAElements(i);
}
for i in 1..40 {
doSomething();
}
forall i in d {
doSomethingOnAElements(i);
}
writeln(A);
The problem is till writeln(A); the code is running only on single core. But
after displaying A
chapel uses all the four cores for a period more than it took to display A.
The problem is same if coforall is used or for is used in substitute to forall.
So how can I make chapel use all the four cores every time?
Below are options(all default) used for chapel:
Parallelism Control Options:
--[no-]local Target one [many] locale[s]
currently: --local
--[no-]serial [Don't] Serialize parallel constructs
currently: --no-serial
--[no-]serial-forall [Don't] Serialize forall constructs
currently: --no-serial-forall
Optimization Control Options:
--fast Use fast default settings
currently: not selected
--[no-]fast-followers Enable [disable] fast followers
currently: --fast-followers
--[no-]optimize-loop-iterators Enable [disable] optimization of
iterators composed of a single loop
currently: --optimize-loop-iterators
Waiting for reply.
Soumen
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users