Brad,

I have some versions on github (https://github.com/barrymoo/
chapel-playground/tree/master/matrix-vector-multiply/versions)

1. version2.chpl - Uses Replicated Distribution of B, I tried both for and
forall inner loops
2. version3.chpl - Uses a 3 x dimension domain for B distributed Cyclically
similarly to A

The github repo is a little messy because I was running a bunch of variants
one directory up, but the version3.chpl was used to generate the image. I
suspected (1) from some of your other emails which is why I tried this
version too. It's nice to know I am thinking about this correctly. Even if
it is only a simple problem.

https://github.com/chapel-lang/chapel/pull/5912


I will grab the request and test it on my code sometime this weekend.

A side note: I did notice that using the ReplicatedDist had *slightly* more
overhead (data transfered) than the by-hand replicated version. Probably
some additional data carried by the structure to describe what locales it
is replicated over? I don't think it is an issue, but I thought it was
interesting.

Thanks again!

Barry

On Thu, Mar 30, 2017 at 8:21 PM, Brad Chamberlain <[email protected]> wrote:

>
> I'm timing out today without digging into this, but will hope to look at
> it more tomorrow.  If your profiling calls are still outside of the forall
> loop, we shouldn't expect it to be communication-free because the loop
> logically starts on locale 0 and has to create tasks on the other locales;
> but at a sniff, the image you sent does seem too high, at least for a
> single loop.  My two guesses would be:  (1) maybe the Replicated
> distribution is not working as well as it should (as I said, I'm not all
> that familiar with it, and I don't think it sees much action in practice);
> (2) maybe I led you astray in some way and we're hauling some global data
> structure across the network from locale 0 more than we should be...
>
> Would you send me the final version of the code from which you created
> this graph (if it changed from the last version modulo my suggested edits)
> as well as the problem size at which you ran it?
>
> BTW, thanks for pointing out that tuple indexing issue bug...  I'm
> embarrassed that it slipped by us (me) and we've got a fix that we'll hope
> to merge tonight once it passes some more testing:
>
>         https://github.com/chapel-lang/chapel/pull/5912
>
> Thanks again,
> -Brad
>
>
> On Thu, 30 Mar 2017, Barry Moore wrote:
>
> Hey Guys,
>>
>> Definitely better now, but there seems to be a lot of communication going
>> on. Maybe I am misinterpreting this plot. I would have guessed by forcing
>> everything to be local on the node I wouldn't be doing any communication
>> inside the work loop.
>>
>> Any thoughts? The rabbit hole gets deep fast ha.
>>
>> - Barry
>>
>> On Thu, Mar 30, 2017 at 4:31 PM, Brad Chamberlain <[email protected]> wrote:
>>
>>
>>> Had a little time to cook up Brad's suggestion as a full program (pasted
>>> in
>>>
>>>> [code][/code]). It is much better with respect to communication (not
>>>> unsurprisingly based on the comments above). Also, it looks a LOT
>>>> cleaner.
>>>>
>>>>
>>> Good!
>>>
>>>
>>> 1. Stylistically, camelCase everywhere? I noticed sometimes with domains,
>>>
>>>> spaces, and locales you use CamelCase (there is probably a name for
>>>> this,
>>>> but I don't want to look it up). Is there a suggestion from you guys?
>>>>
>>>>
>>> Most of us on the project use camelCase, but not exclusively --
>>> underscores_still_appear at times.  I'm probably an outlier, but I tend
>>> to
>>> use InitialCapsCamelCase for distributed domains and arrays because, in
>>> my
>>> mind, they are the big-ticket items from a cost/weight perspective and
>>> deserve being distinguished in some way.  Sometimes that bleeds over to
>>> using initial caps for my "main" domains/arrays that define the overall
>>> problem size/space as well in non-distributed programs.  But again, I'd
>>> say
>>> use personal preference -- I'm not a big one for imposing
>>> naming/stylistic
>>> conventions on others (for better or worse) and to date there is no
>>> Chapel
>>> style guide.
>>>
>>>
>>> 1. In this line `for (j, a) in zip({0..#dimension}, A(i(1), ..)) {`: I
>>> use
>>>
>>>> 0 indexing everywhere, but I have to use i(1) in the for loop. It feels
>>>> and
>>>> looks awkward to me. Maybe it is unavoidable, but it will be hard to
>>>> remember for me.
>>>>
>>>>
>>> We tried to make Chapel as index-neutral as possible (i.e., arrays can
>>> start at 0 or 1 or -3 or lo or whatever), but in a few places we had to
>>> make a choice, where tuple indexing (what you're doing here) is one of
>>> the
>>> main ones.  I've tried to concoct ways of making tuple indexing either
>>> index-neutral or at least user-configurable, but haven't yet come up with
>>> anything appealing -- if you have ideas, send them in.
>>>
>>> In the meantime, one way to avoid relying on tuple indexing is to
>>> detuple,
>>> so:
>>>
>>>         const D = {1..n, 1..n};
>>>
>>>         forall (i,j) in D ...  // i and j are integers
>>>
>>> rather than:
>>>
>>>         forall ij in D ...  // ij is a 2-tuple
>>>
>>> This stacks with zippering as well:
>>>
>>>         forall ((i,j), a) in zip(D, A) ...
>>>
>>>
>>> But shoot, in your code, vectorCyclic should really be yielding integers,
>>> not tuples since it's 1D.  This looks like a regression that has been
>>> introduced since 1.14 to me (I assume you're working on master?) and is
>>> likely something that I broke...  1D arrays can be indexed using either
>>> 1-tuples or integers, so I'm guessing that all of our existing tests just
>>> feed the indices yielded by the loop back into array accesses and are
>>> none
>>> the wiser... :( I'm going to see whether we can get a fix in before next
>>> week's release to avoid this unintended change in behavior...
>>>
>>> Bottom line, you _ought_ to be able to write the inner loop as:
>>>
>>>           for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>>
>>> and as a means of working around the regression and avoiding the tuple
>>> indexing, for the time being, you can write the outer loop as:
>>>
>>>         forall ((i,), c) in zip(vectorCyclic, C) {
>>>
>>> and once the regression is fixed can change back to simply:
>>>
>>>         forall (i, c) in zip(vectorCyclic, C) {
>>>           for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>>
>>>
>>> I still want to produce some timing results, but it will take me a little
>>>
>>>> bit of time. Then, on to the matrix-matrix product :)
>>>>
>>>>
>>> Be sure to use the --fast flag before doing any timing experiments...
>>>
>>> -Brad
>>>
>>>
>>
>>
>> --
>> Barry E Moore II, PhD
>> E-mail: [email protected]
>>
>> Assistant Research Professor
>> Center for Simulation and Modeling
>> University of Pittsburgh
>> Pittsburgh, PA 15260
>>
>>


-- 
Barry E Moore II, PhD
E-mail: [email protected]

Assistant Research Professor
Center for Simulation and Modeling
University of Pittsburgh
Pittsburgh, PA 15260
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to