I'm timing out today without digging into this, but will hope to look at
it more tomorrow. If your profiling calls are still outside of the forall
loop, we shouldn't expect it to be communication-free because the loop
logically starts on locale 0 and has to create tasks on the other locales;
but at a sniff, the image you sent does seem too high, at least for a
single loop. My two guesses would be: (1) maybe the Replicated
distribution is not working as well as it should (as I said, I'm not all
that familiar with it, and I don't think it sees much action in practice);
(2) maybe I led you astray in some way and we're hauling some global data
structure across the network from locale 0 more than we should be...
Would you send me the final version of the code from which you created
this graph (if it changed from the last version modulo my suggested edits)
as well as the problem size at which you ran it?
BTW, thanks for pointing out that tuple indexing issue bug... I'm
embarrassed that it slipped by us (me) and we've got a fix that we'll hope
to merge tonight once it passes some more testing:
https://github.com/chapel-lang/chapel/pull/5912
Thanks again,
-Brad
On Thu, 30 Mar 2017, Barry Moore wrote:
> Hey Guys,
>
> Definitely better now, but there seems to be a lot of communication going
> on. Maybe I am misinterpreting this plot. I would have guessed by forcing
> everything to be local on the node I wouldn't be doing any communication
> inside the work loop.
>
> Any thoughts? The rabbit hole gets deep fast ha.
>
> - Barry
>
> On Thu, Mar 30, 2017 at 4:31 PM, Brad Chamberlain <[email protected]> wrote:
>
>>
>> Had a little time to cook up Brad's suggestion as a full program (pasted in
>>> [code][/code]). It is much better with respect to communication (not
>>> unsurprisingly based on the comments above). Also, it looks a LOT cleaner.
>>>
>>
>> Good!
>>
>>
>> 1. Stylistically, camelCase everywhere? I noticed sometimes with domains,
>>> spaces, and locales you use CamelCase (there is probably a name for this,
>>> but I don't want to look it up). Is there a suggestion from you guys?
>>>
>>
>> Most of us on the project use camelCase, but not exclusively --
>> underscores_still_appear at times. I'm probably an outlier, but I tend to
>> use InitialCapsCamelCase for distributed domains and arrays because, in my
>> mind, they are the big-ticket items from a cost/weight perspective and
>> deserve being distinguished in some way. Sometimes that bleeds over to
>> using initial caps for my "main" domains/arrays that define the overall
>> problem size/space as well in non-distributed programs. But again, I'd say
>> use personal preference -- I'm not a big one for imposing naming/stylistic
>> conventions on others (for better or worse) and to date there is no Chapel
>> style guide.
>>
>>
>> 1. In this line `for (j, a) in zip({0..#dimension}, A(i(1), ..)) {`: I use
>>> 0 indexing everywhere, but I have to use i(1) in the for loop. It feels
>>> and
>>> looks awkward to me. Maybe it is unavoidable, but it will be hard to
>>> remember for me.
>>>
>>
>> We tried to make Chapel as index-neutral as possible (i.e., arrays can
>> start at 0 or 1 or -3 or lo or whatever), but in a few places we had to
>> make a choice, where tuple indexing (what you're doing here) is one of the
>> main ones. I've tried to concoct ways of making tuple indexing either
>> index-neutral or at least user-configurable, but haven't yet come up with
>> anything appealing -- if you have ideas, send them in.
>>
>> In the meantime, one way to avoid relying on tuple indexing is to detuple,
>> so:
>>
>> const D = {1..n, 1..n};
>>
>> forall (i,j) in D ... // i and j are integers
>>
>> rather than:
>>
>> forall ij in D ... // ij is a 2-tuple
>>
>> This stacks with zippering as well:
>>
>> forall ((i,j), a) in zip(D, A) ...
>>
>>
>> But shoot, in your code, vectorCyclic should really be yielding integers,
>> not tuples since it's 1D. This looks like a regression that has been
>> introduced since 1.14 to me (I assume you're working on master?) and is
>> likely something that I broke... 1D arrays can be indexed using either
>> 1-tuples or integers, so I'm guessing that all of our existing tests just
>> feed the indices yielded by the loop back into array accesses and are none
>> the wiser... :( I'm going to see whether we can get a fix in before next
>> week's release to avoid this unintended change in behavior...
>>
>> Bottom line, you _ought_ to be able to write the inner loop as:
>>
>> for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>
>> and as a means of working around the regression and avoiding the tuple
>> indexing, for the time being, you can write the outer loop as:
>>
>> forall ((i,), c) in zip(vectorCyclic, C) {
>>
>> and once the regression is fixed can change back to simply:
>>
>> forall (i, c) in zip(vectorCyclic, C) {
>> for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>
>>
>> I still want to produce some timing results, but it will take me a little
>>> bit of time. Then, on to the matrix-matrix product :)
>>>
>>
>> Be sure to use the --fast flag before doing any timing experiments...
>>
>> -Brad
>>
>
>
>
> --
> Barry E Moore II, PhD
> E-mail: [email protected]
>
> Assistant Research Professor
> Center for Simulation and Modeling
> University of Pittsburgh
> Pittsburgh, PA 15260
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users