Hi John --
I'm Greg Titus, and I work on communication and tasking, mostly in the
runtime but some also in the modules and compiler. I've responded to
your first paragraph, below.
On 2/10/2015 8:15 AM, John MacFrenz wrote:
The number of tasks used to implement a forall loop is equal to the number
of tasks created by its leader or standalone iterator. For most of our
standard/built-in types (ranges, domains, arrays, Block, etc.) this tends
to be established by a global config named 'dataParTasksPerLocale' which
itself defaults to 'here.maxTaskPar' which is the amount of concurrency
that the tasking layer says it can support (which in turn tends to be the
number of processor cores on the locale) by default.
I'm trying to get chapel to spawn just one task per locale, but haven't had success yet.
I tried using --dataParTasksPerLocale=1, however monitoring CPU load showed this didn't
work. I also tried setting environment variable CHPL_RT_NUM_THREADS_PER_LOCALE. When that
env var was set to 1, the program did hang. When set to 2, it did run, but three of eight
logical cores (4 physical, 8 with HT) were loaded. Basically I'd just like to monitor how
load is balanced across locales without having access to a cluster. I have chapel
compiled with fifo and no hwloc, maybe this is the cause? Compiling chapel with qthreads
and hwloc failed, IIRC error was that "no working instance of hwloc found".
The best way to get Chapel to spawn exactly one task per locale is
probably the direct approach, using a coforall-stmt:
coforall loc in Locales do on loc do <body>;
Using a forall-stmt instead and trying to force it to create one task
per locale will be tricky (perhaps impossible), because the forall
explicitly gives Chapel the responsibility for determining the
appropriate number of tasks per locale and you can only indirectly
influence how it does that. That said, setting dataParTasksPerLocale=1
should result in exactly 1 task per forall-stmt iteration. The downside
is that it will do so throughout the entire life of the program,
affecting every forall-stmt it runs, which is probably a bigger hammer
than you want. Monitoring CPU load may not give you an accurate read on
what's happening, by the way, because user program tasks aren't
necessarily the only thing running in a Chapel program. I haven't fully
kept up with this thread; did you build with CHPL_COMM=gasnet to use
GASNet for inter-locale communication? If so, then you have an
additional pthread running on every locale, to handle Active Messages.
It wasn't clear to me how you were executing. Are you running your
program on just one system node? If so, and if you're using gasnet, then
with dataParTasksPerLocale=2 I think seeing 3 busy cores sounds right: 2
for the user tasks and 1 for the GASNet Active Message handler.
Setting CHPL_RT_NUM_THREADS_PER_LOCALE is definitely an indirect
technique: instead of throttling Chapel task creation it throttles the
number of underlying system threads used to run those tasks. Setting it
to 1 will cause deadlock in many cases, as you discovered. Higher
numbers will avoid that problem and the setting will percolate up to
influence how forall-stmt iterators decide how many tasks to create, but
the direct approach of using a coforall-stmt will still work better.
Whether you use the fifo tasking layer implementation or qthreads won't
matter much with regard to questions about Chapel tasks as such, but it
will affect what you see on a system monitor. In particular,
tasks=qthreads will create more busy-waiting pthreads than tasks=fifo
will. So for these studies, fifo might be the better way to go.
Finally, with respect to the runtime build issues, which version of
Chapel are you using? I'd be interested in knowing what the build
problem was with hwloc/qthreads, if this was with the current 1.10
release. Can you send me your build log?
thanks,
greg
I can't remember whether it was on this thread or another that I mentioned
an intention to support 'ref' members of classes/records as part of
upcoming work. Is the thing you wanted the ability to have a class that
is generic w.r.t. whether its fields are stored in place or by reference?
I think you mentioned that in this thread. Yes, pretty much that is what I
want, except that certain fields should always be stored in place. But my
current work-around of wrapping fields inside classes will do for now. I have
c/c++ background where it has to be explicitly stated whether something is to
be stored as pointer or in place, so I sometimes find it rather confusing that
some types are implicitly stored/passed/copied/assigned as references while
others are not...
I think the purpose of these is to do a deep copy rather than sharing.
Block.dsiClone() I suspect is rarely used in practice, as it would relate
to creating and copying variables of the domain map type, where most
current uses don't do a lot of first-class manipluation of index sets.
Block.dsiCreateReindexDist() is also used rarely, and we've recently been
discussing different approaches to handling reindexing because the current
interface causes challenges, both in implementation and optimization.
There's a textfile:
doc/[release/]technotes/README.dsi
that describes the dsi interface, and may serve as a good reference,
though I haven't had the chance to spend much time with it myself.
dsiClone is one of methods that isn't documented almost at all in that file. I
tried just commenting in out, but apparently it is used in some of the most
basic operations (creation, assignment, iteration etc. of domains/arrays). But
knowing that I should make a deep copy is enough information for now, I think.
In privatization, why are the dsiGetPrivatizeData() methods needed? Since
dsiPrivatize(privatizeData) has access to "this", why can't it just get values
from that? Also, is there a way to explicitly request a reprivatization?
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users