Re: Variable Block Distributions

Greg Titus Tue, 10 Feb 2015 12:53:38 -0800

Hi John --

I'm Greg Titus, and I work on communication and tasking, mostly in theruntime but some also in the modules and compiler. I've responded toyour first paragraph, below.


On 2/10/2015 8:15 AM, John MacFrenz wrote:

  The number of tasks used to implement a forall loop is equal to the number
  of tasks created by its leader or standalone iterator.  For most of our
  standard/built-in types (ranges, domains, arrays, Block, etc.) this tends
  to be established by a global config named 'dataParTasksPerLocale' which
  itself defaults to 'here.maxTaskPar' which is the amount of concurrency
  that the tasking layer says it can support (which in turn tends to be the
  number of processor cores on the locale) by default.

I'm trying to get chapel to spawn just one task per locale, but haven't had success yet. 
I tried using --dataParTasksPerLocale=1, however monitoring CPU load showed this didn't 
work. I also tried setting environment variable CHPL_RT_NUM_THREADS_PER_LOCALE. When that 
env var was set to 1, the program did hang. When set to 2, it did run, but three of eight 
logical cores (4 physical, 8 with HT) were loaded. Basically I'd just like to monitor how 
load is balanced across locales without having access to a cluster. I have chapel 
compiled with fifo and no hwloc, maybe this is the cause? Compiling chapel with qthreads 
and hwloc failed, IIRC error was that "no working instance of hwloc found".

The best way to get Chapel to spawn exactly one task per locale isprobably the direct approach, using a coforall-stmt:


   coforall loc in Locales do on loc do <body>;

Using a forall-stmt instead and trying to force it to create one taskper locale will be tricky (perhaps impossible), because the forallexplicitly gives Chapel the responsibility for determining theappropriate number of tasks per locale and you can only indirectlyinfluence how it does that. That said, setting dataParTasksPerLocale=1should result in exactly 1 task per forall-stmt iteration. The downsideis that it will do so throughout the entire life of the program,affecting every forall-stmt it runs, which is probably a bigger hammerthan you want. Monitoring CPU load may not give you an accurate read onwhat's happening, by the way, because user program tasks aren'tnecessarily the only thing running in a Chapel program. I haven't fullykept up with this thread; did you build with CHPL_COMM=gasnet to useGASNet for inter-locale communication? If so, then you have anadditional pthread running on every locale, to handle Active Messages.

It wasn't clear to me how you were executing. Are you running yourprogram on just one system node? If so, and if you're using gasnet, thenwith dataParTasksPerLocale=2 I think seeing 3 busy cores sounds right: 2for the user tasks and 1 for the GASNet Active Message handler.

Setting CHPL_RT_NUM_THREADS_PER_LOCALE is definitely an indirecttechnique: instead of throttling Chapel task creation it throttles thenumber of underlying system threads used to run those tasks. Setting itto 1 will cause deadlock in many cases, as you discovered. Highernumbers will avoid that problem and the setting will percolate up toinfluence how forall-stmt iterators decide how many tasks to create, butthe direct approach of using a coforall-stmt will still work better.

Whether you use the fifo tasking layer implementation or qthreads won'tmatter much with regard to questions about Chapel tasks as such, but itwill affect what you see on a system monitor. In particular,tasks=qthreads will create more busy-waiting pthreads than tasks=fifowill. So for these studies, fifo might be the better way to go.

Finally, with respect to the runtime build issues, which version ofChapel are you using? I'd be interested in knowing what the buildproblem was with hwloc/qthreads, if this was with the current 1.10release. Can you send me your build log?


thanks,
greg

  I can't remember whether it was on this thread or another that I mentioned
  an intention to support 'ref' members of classes/records as part of
  upcoming work.  Is the thing you wanted the ability to have a class that
  is generic w.r.t. whether its fields are stored in place or by reference?

I think you mentioned that in this thread. Yes, pretty much that is what I 
want, except that certain fields should always be stored in place. But my 
current work-around of wrapping fields inside classes will do for now. I have 
c/c++ background where it has to be explicitly stated whether something is to 
be stored as pointer or in place, so I sometimes find it rather confusing that 
some types are implicitly stored/passed/copied/assigned as references while 
others are not...

  I think the purpose of these is to do a deep copy rather than sharing.
  Block.dsiClone() I suspect is rarely used in practice, as it would relate
  to creating and copying variables of the domain map type, where most
  current uses don't do a lot of first-class manipluation of index sets.
  Block.dsiCreateReindexDist() is also used rarely, and we've recently been
  discussing different approaches to handling reindexing because the current
  interface causes challenges, both in implementation and optimization.

  There's a textfile:

           doc/[release/]technotes/README.dsi

  that describes the dsi interface, and may serve as a good reference,
  though I haven't had the chance to spend much time with it myself.

dsiClone is one of methods that isn't documented almost at all in that file. I 
tried just commenting in out, but apparently it is used in some of the most 
basic operations (creation, assignment, iteration etc. of domains/arrays). But 
knowing that I should make a deep copy is enough information for now, I think.

In privatization, why are the dsiGetPrivatizeData() methods needed? Since 
dsiPrivatize(privatizeData) has access to "this", why can't it just get values 
from that? Also, is there a way to explicitly request a reprivatization?

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Variable Block Distributions

Reply via email to