Hi John --

I'm Greg Titus, and I work on communication and tasking, mostly in the runtime but some also in the modules and compiler. I've responded to your first paragraph, below.

On 2/10/2015 8:15 AM, John MacFrenz wrote:

  The number of tasks used to implement a forall loop is equal to the number
  of tasks created by its leader or standalone iterator.  For most of our
  standard/built-in types (ranges, domains, arrays, Block, etc.) this tends
  to be established by a global config named 'dataParTasksPerLocale' which
  itself defaults to 'here.maxTaskPar' which is the amount of concurrency
  that the tasking layer says it can support (which in turn tends to be the
  number of processor cores on the locale) by default.
I'm trying to get chapel to spawn just one task per locale, but haven't had success yet. 
I tried using --dataParTasksPerLocale=1, however monitoring CPU load showed this didn't 
work. I also tried setting environment variable CHPL_RT_NUM_THREADS_PER_LOCALE. When that 
env var was set to 1, the program did hang. When set to 2, it did run, but three of eight 
logical cores (4 physical, 8 with HT) were loaded. Basically I'd just like to monitor how 
load is balanced across locales without having access to a cluster. I have chapel 
compiled with fifo and no hwloc, maybe this is the cause? Compiling chapel with qthreads 
and hwloc failed, IIRC error was that "no working instance of hwloc found".

The best way to get Chapel to spawn exactly one task per locale is probably the direct approach, using a coforall-stmt:

   coforall loc in Locales do on loc do <body>;

Using a forall-stmt instead and trying to force it to create one task per locale will be tricky (perhaps impossible), because the forall explicitly gives Chapel the responsibility for determining the appropriate number of tasks per locale and you can only indirectly influence how it does that. That said, setting dataParTasksPerLocale=1 should result in exactly 1 task per forall-stmt iteration. The downside is that it will do so throughout the entire life of the program, affecting every forall-stmt it runs, which is probably a bigger hammer than you want. Monitoring CPU load may not give you an accurate read on what's happening, by the way, because user program tasks aren't necessarily the only thing running in a Chapel program. I haven't fully kept up with this thread; did you build with CHPL_COMM=gasnet to use GASNet for inter-locale communication? If so, then you have an additional pthread running on every locale, to handle Active Messages.

It wasn't clear to me how you were executing. Are you running your program on just one system node? If so, and if you're using gasnet, then with dataParTasksPerLocale=2 I think seeing 3 busy cores sounds right: 2 for the user tasks and 1 for the GASNet Active Message handler.

Setting CHPL_RT_NUM_THREADS_PER_LOCALE is definitely an indirect technique: instead of throttling Chapel task creation it throttles the number of underlying system threads used to run those tasks. Setting it to 1 will cause deadlock in many cases, as you discovered. Higher numbers will avoid that problem and the setting will percolate up to influence how forall-stmt iterators decide how many tasks to create, but the direct approach of using a coforall-stmt will still work better.

Whether you use the fifo tasking layer implementation or qthreads won't matter much with regard to questions about Chapel tasks as such, but it will affect what you see on a system monitor. In particular, tasks=qthreads will create more busy-waiting pthreads than tasks=fifo will. So for these studies, fifo might be the better way to go.

Finally, with respect to the runtime build issues, which version of Chapel are you using? I'd be interested in knowing what the build problem was with hwloc/qthreads, if this was with the current 1.10 release. Can you send me your build log?

thanks,
greg



  I can't remember whether it was on this thread or another that I mentioned
  an intention to support 'ref' members of classes/records as part of
  upcoming work.  Is the thing you wanted the ability to have a class that
  is generic w.r.t. whether its fields are stored in place or by reference?
I think you mentioned that in this thread. Yes, pretty much that is what I 
want, except that certain fields should always be stored in place. But my 
current work-around of wrapping fields inside classes will do for now. I have 
c/c++ background where it has to be explicitly stated whether something is to 
be stored as pointer or in place, so I sometimes find it rather confusing that 
some types are implicitly stored/passed/copied/assigned as references while 
others are not...

  I think the purpose of these is to do a deep copy rather than sharing.
  Block.dsiClone() I suspect is rarely used in practice, as it would relate
  to creating and copying variables of the domain map type, where most
  current uses don't do a lot of first-class manipluation of index sets.
  Block.dsiCreateReindexDist() is also used rarely, and we've recently been
  discussing different approaches to handling reindexing because the current
  interface causes challenges, both in implementation and optimization.

  There's a textfile:

           doc/[release/]technotes/README.dsi

  that describes the dsi interface, and may serve as a good reference,
  though I haven't had the chance to spend much time with it myself.
dsiClone is one of methods that isn't documented almost at all in that file. I 
tried just commenting in out, but apparently it is used in some of the most 
basic operations (creation, assignment, iteration etc. of domains/arrays). But 
knowing that I should make a deep copy is enough information for now, I think.

In privatization, why are the dsiGetPrivatizeData() methods needed? Since 
dsiPrivatize(privatizeData) has access to "this", why can't it just get values 
from that? Also, is there a way to explicitly request a reprivatization?

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to