Also would you mind giving a (very) brief explanation about the caching
 system used? Or point me to a relevant paper.

I meant the one that's present in BlockDist.chpl file, called RAD cache. Or are there more than one in there...? Especially information related to stencils and ghost cells would be appreciated.

There currently isn't support in BlockDist itself for stencil-style caching at present. There's a start at how this might be added in [test/release/]examples/benchmarks/miniMD/helpers/StencilDist.chpl, which is itself a clone of BlockDist, extended to support halos/ghost cells ("fluff"). The main downside of using StencilDist at present is that the user is responsible for explicitly requesting the updates to the caches (rather than having the compiler automatically insert those calls in order to support the "plug-and-play" vision of domain maps with no code changes). The miniMD code in that directory (and its parent) are the best illustrations of this feature at present.

The RAD cache is a different type of cache. It's designed to cache the meta-data in the array class itself ("the dope vector" essentially) which will permit one locale to index into another's memory in a single message.
I.e., if you own index (i,j) of a distributed array, I can either:

(a) remotely access all the meta-data from your descriptor that I need to
    do the indexing calculation to determine the element's address myself
    (requires lots of gets);

(b) ship you (i,j), have you do all the indexing and return a reference
    to the array element (requires an active message);

(c) cache all the meta-data from your descriptor such that, once I have
    it, I can do the indexing locally (requires gets but only to populate
    the cache; after that, remote accesses should be free).

My high-level understanding of what's required to enable this optimization for a given distribution is to identify which parts of the array descriptor need to be cached to do this type of remote access. For the changes to Block that we've discussed, I would guess no changes would be required. (I also typically suggest that people not worry about this optimization until they get things up and running).

The best description of this that I'm aware of is in the commit messages that added the capability. For example:

        https://github.com/bradcray/chapel/commit/633353f
        https://github.com/bradcray/chapel/commit/fa9b841

though there may be other documentation. I can ask around if desired and nobody else speaks up on this thread.

-Brad
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to