Brad,
The system size for the picture was 10, but with 1000 it is basically the
same overall shape.
A thought I had this morning on my bus ride: get/put don't necessarily mean
movement of data but could mean task distribution. If it's possible to
separate the two in chplvis, it would be less confusing for beginners. No
idea if this is possible, but I can try to check it out at some point.
Thanks!
Barry
On Fri, Mar 31, 2017 at 1:04 AM, Barry Moore <[email protected]> wrote:
> Brad,
>
> I have some versions on github (https://github.com/barrymoo/c
> hapel-playground/tree/master/matrix-vector-multiply/versions)
>
> 1. version2.chpl - Uses Replicated Distribution of B, I tried both for and
> forall inner loops
> 2. version3.chpl - Uses a 3 x dimension domain for B distributed
> Cyclically similarly to A
>
> The github repo is a little messy because I was running a bunch of
> variants one directory up, but the version3.chpl was used to generate the
> image. I suspected (1) from some of your other emails which is why I tried
> this version too. It's nice to know I am thinking about this correctly.
> Even if it is only a simple problem.
>
> https://github.com/chapel-lang/chapel/pull/5912
>
>
> I will grab the request and test it on my code sometime this weekend.
>
> A side note: I did notice that using the ReplicatedDist had *slightly* more
> overhead (data transfered) than the by-hand replicated version. Probably
> some additional data carried by the structure to describe what locales it
> is replicated over? I don't think it is an issue, but I thought it was
> interesting.
>
> Thanks again!
>
> Barry
>
> On Thu, Mar 30, 2017 at 8:21 PM, Brad Chamberlain <[email protected]> wrote:
>
>>
>> I'm timing out today without digging into this, but will hope to look at
>> it more tomorrow. If your profiling calls are still outside of the forall
>> loop, we shouldn't expect it to be communication-free because the loop
>> logically starts on locale 0 and has to create tasks on the other locales;
>> but at a sniff, the image you sent does seem too high, at least for a
>> single loop. My two guesses would be: (1) maybe the Replicated
>> distribution is not working as well as it should (as I said, I'm not all
>> that familiar with it, and I don't think it sees much action in practice);
>> (2) maybe I led you astray in some way and we're hauling some global data
>> structure across the network from locale 0 more than we should be...
>>
>> Would you send me the final version of the code from which you created
>> this graph (if it changed from the last version modulo my suggested edits)
>> as well as the problem size at which you ran it?
>>
>> BTW, thanks for pointing out that tuple indexing issue bug... I'm
>> embarrassed that it slipped by us (me) and we've got a fix that we'll hope
>> to merge tonight once it passes some more testing:
>>
>> https://github.com/chapel-lang/chapel/pull/5912
>>
>> Thanks again,
>> -Brad
>>
>>
>> On Thu, 30 Mar 2017, Barry Moore wrote:
>>
>> Hey Guys,
>>>
>>> Definitely better now, but there seems to be a lot of communication going
>>> on. Maybe I am misinterpreting this plot. I would have guessed by forcing
>>> everything to be local on the node I wouldn't be doing any communication
>>> inside the work loop.
>>>
>>> Any thoughts? The rabbit hole gets deep fast ha.
>>>
>>> - Barry
>>>
>>> On Thu, Mar 30, 2017 at 4:31 PM, Brad Chamberlain <[email protected]>
>>> wrote:
>>>
>>>
>>>> Had a little time to cook up Brad's suggestion as a full program
>>>> (pasted in
>>>>
>>>>> [code][/code]). It is much better with respect to communication (not
>>>>> unsurprisingly based on the comments above). Also, it looks a LOT
>>>>> cleaner.
>>>>>
>>>>>
>>>> Good!
>>>>
>>>>
>>>> 1. Stylistically, camelCase everywhere? I noticed sometimes with
>>>> domains,
>>>>
>>>>> spaces, and locales you use CamelCase (there is probably a name for
>>>>> this,
>>>>> but I don't want to look it up). Is there a suggestion from you guys?
>>>>>
>>>>>
>>>> Most of us on the project use camelCase, but not exclusively --
>>>> underscores_still_appear at times. I'm probably an outlier, but I tend
>>>> to
>>>> use InitialCapsCamelCase for distributed domains and arrays because, in
>>>> my
>>>> mind, they are the big-ticket items from a cost/weight perspective and
>>>> deserve being distinguished in some way. Sometimes that bleeds over to
>>>> using initial caps for my "main" domains/arrays that define the overall
>>>> problem size/space as well in non-distributed programs. But again, I'd
>>>> say
>>>> use personal preference -- I'm not a big one for imposing
>>>> naming/stylistic
>>>> conventions on others (for better or worse) and to date there is no
>>>> Chapel
>>>> style guide.
>>>>
>>>>
>>>> 1. In this line `for (j, a) in zip({0..#dimension}, A(i(1), ..)) {`: I
>>>> use
>>>>
>>>>> 0 indexing everywhere, but I have to use i(1) in the for loop. It feels
>>>>> and
>>>>> looks awkward to me. Maybe it is unavoidable, but it will be hard to
>>>>> remember for me.
>>>>>
>>>>>
>>>> We tried to make Chapel as index-neutral as possible (i.e., arrays can
>>>> start at 0 or 1 or -3 or lo or whatever), but in a few places we had to
>>>> make a choice, where tuple indexing (what you're doing here) is one of
>>>> the
>>>> main ones. I've tried to concoct ways of making tuple indexing either
>>>> index-neutral or at least user-configurable, but haven't yet come up
>>>> with
>>>> anything appealing -- if you have ideas, send them in.
>>>>
>>>> In the meantime, one way to avoid relying on tuple indexing is to
>>>> detuple,
>>>> so:
>>>>
>>>> const D = {1..n, 1..n};
>>>>
>>>> forall (i,j) in D ... // i and j are integers
>>>>
>>>> rather than:
>>>>
>>>> forall ij in D ... // ij is a 2-tuple
>>>>
>>>> This stacks with zippering as well:
>>>>
>>>> forall ((i,j), a) in zip(D, A) ...
>>>>
>>>>
>>>> But shoot, in your code, vectorCyclic should really be yielding
>>>> integers,
>>>> not tuples since it's 1D. This looks like a regression that has been
>>>> introduced since 1.14 to me (I assume you're working on master?) and is
>>>> likely something that I broke... 1D arrays can be indexed using either
>>>> 1-tuples or integers, so I'm guessing that all of our existing tests
>>>> just
>>>> feed the indices yielded by the loop back into array accesses and are
>>>> none
>>>> the wiser... :( I'm going to see whether we can get a fix in before next
>>>> week's release to avoid this unintended change in behavior...
>>>>
>>>> Bottom line, you _ought_ to be able to write the inner loop as:
>>>>
>>>> for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>>>
>>>> and as a means of working around the regression and avoiding the tuple
>>>> indexing, for the time being, you can write the outer loop as:
>>>>
>>>> forall ((i,), c) in zip(vectorCyclic, C) {
>>>>
>>>> and once the regression is fixed can change back to simply:
>>>>
>>>> forall (i, c) in zip(vectorCyclic, C) {
>>>> for (j, a) in zip({0..#dimension}, A(i, ..)) {
>>>>
>>>>
>>>> I still want to produce some timing results, but it will take me a
>>>> little
>>>>
>>>>> bit of time. Then, on to the matrix-matrix product :)
>>>>>
>>>>>
>>>> Be sure to use the --fast flag before doing any timing experiments...
>>>>
>>>> -Brad
>>>>
>>>>
>>>
>>>
>>> --
>>> Barry E Moore II, PhD
>>> E-mail: [email protected]
>>>
>>> Assistant Research Professor
>>> Center for Simulation and Modeling
>>> University of Pittsburgh
>>> Pittsburgh, PA 15260
>>>
>>>
>
>
> --
> Barry E Moore II, PhD
> E-mail: [email protected]
>
> Assistant Research Professor
> Center for Simulation and Modeling
> University of Pittsburgh
> Pittsburgh, PA 15260
>
--
Barry E Moore II, PhD
E-mail: [email protected]
Assistant Research Professor
Center for Simulation and Modeling
University of Pittsburgh
Pittsburgh, PA 15260
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users