Hi John -

To use the cache for remote data, specify --cache-remote when compiling
your program (the Chapel compiler make process should build support for
it). As I said before, we expect it to work with FIFO tasking but qthreads
will need work before it can be relied upon.

Regarding the communication counts - see
test/performance/ferguson/remote-class-read.chpl for an example program
that measures and prints out the communication counts (numbers of gets,
puts, etc).

Cheers,

-michael

On Wed, Feb 18, 2015 at 12:36 PM, John MacFrenz <[email protected]> wrote:

> Hi,
>
> I've been a bit busy lately, but I should have time to work more on this
> issue on next week.
>
> 18.02.2015, 18:47, "Michael Ferguson" <[email protected]>:
>
> Hi John -
>
> I just wanted to add a little to this exchange. You were talking about
> your custom distribution and you said:
>
>
> My implementation isn't quite as efficient as I'd like, though I'll see
> what kind of improvement some caching will bring...
>
>
> You might be interested to try the --cache-remote option when compiling
> (which should work with GASNet and FIFO tasking, and maybe with qthreads).
> That will activate a runtime feature that caches remote data. This feature
> might be able to improve your distribution's performance without a lot of
> extra work - or give you an idea how much you can reduce communication with
> privatization/manual caching in the distribution implementation. If you try
> it - please let me know how it goes since this feature is still pretty new.
>
>
>
> I actually already did implement some caching and they improved
> performance significantly. But I can try if that option brings some further
> improvement, though I doubt further caching would help significantly unless
> I have missed something very obvious... Would you clarify should I pass
> that option while compiling chapel or just my own program? I guess that
> --fast flag would also help.
>
>
>
> As another note, you mentioned a desire to check the load balance when
> running on a single machine (vs on a cluster). In that situation, be sure
> to carefully measure something that will be the same in a cluster run. In
> particular, runtime is not a good thing to measure in this situation
> because you'll probably be oversubscribing the single machine and each task
> will slow down (because tasks/threads will be multiplexed to fewer
> processors). However, things like # of array elements processed on each
> locale or the communication counts are good things to measure that should
> be the same when you execute on a real cluster.
>
>
> Hmm, I have to think about that. I needed a mechanism to measure execution
> time anyway, to that would have been easily available, that's why I though
> of that first. But number of elements per locale should be very easy to
> gather.
>
>
>
> If you'd like help with gathering communication counts - just ask.
>
> Sure, I would appreciate any help :) Though some more work on the code is
> required before. What kind of hardware you have access to?
>
>
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to