This is good work, so I am happy to see it come over.  My initial
understanding was that
 there would be compile time protection for this.  In the absence of this, I
think we need
 to see performance data on a variety of communication substrates.  It seems
like a latency
 measurement is, perhaps, the most sensitive measurement, and should be
sufficient to
 see the impact on the critical path.

Rich


On 7/25/07 9:04 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> WHAT:    Merge the sparse groups work to the trunk; get the community's
>           opinion on one remaining issue.
> WHY:     For large MPI jobs, it can be memory-prohibitive to fully
>           represent dense groups; you can save a lot of space by having
>           "sparse" representations of groups that are (for example)
>           derived from MPI_COMM_WORLD.
> WHERE:   Main changes are (might have missed a few in this analysis,
>           but this is 99% of it):
>           - Big changes in ompi/group
>           - Moderate changes in ompi/comm
>           - Trivial changes in ompi/mpi/c, ompi/mca/pml/[dr|ob1],
>             ompi/mca/comm/sm
> WHEN:    The code is ready now in /tmp/sparse-groups (it is passing
>           all Intel and IBM tests; see below).
> TIMEOUT: We'll merge all the work to the trunk and enable the
>           possibility of using sparse groups (dense will still be the
>           default, of course) if no one objects by COB Tuesday, 31 Aug
>           2007.
> 
> ========================================================================
> ===
> 
> The sparse groups work from U. Houston is ready to be brought into the
> trunk.  It is built on the premise that for very large MPI jobs, you
> don't want to fully represent MPI groups in memory if you don't have
> to.  Specifically, you can save memory for communicators/groups that
> are derived from MPI_COMM_WORLD by representing them in a sparse
> storage format.
> 
> The sparse groups work introduces 3 new ompi_group_t storage formats:
> 
> * dense (i.e., what it is today -- an array of ompi_proc_t pointers)
> * sparse, where the current group's contents are based on the group
>    from which the child was derived:
>    1. range: a series of (offset,length) tuples
>    2. stride: a single (first,stride,last) tuple
>    3. bitmap: a bitmap
> 
> Currently, all the sparse groups code must be enabled by configuring
> with --enable-sparse-groups.  If sparse groups are enabled, each MPI
> group that is created will automatically use the storage format that
> takes the least amount of space.
> 
> The Big Issue with the sparse groups is that getting a pointer to an
> ompi_proc_t may no longer be an O(1) operation -- you can't just
> access it via comm->group->procs[i].  Instead, you have to call a
> macro.  If sparse groups are enabled, this will call a function to do
> the resolution and return the proc pointer.  If sparse groups are not
> enabled, the macro currently resolves to group->procs[i].
> 
> When sparse groups are enabled, looking up a proc pointer is an
> iterative process; you have to traverse up through one or more parent
> groups until you reach a "dense" group to get the pointer.  So the
> time to lookup the proc pointer (essentially) depends on the group and
> how many times it has been derived from a parent group (there are
> corner cases where the lookup time is shorter).  Lookup times in
> MPI_COMM_WORLD are O(1) because it is dense, but it now requires an
> inline function call rather than directly accessing the data
> structure (see below).
> 
> Note that the code in /tmp/sparse-groups is currently out-of-date with
> respect to the orte and opal trees due to SVN merge mistakes and
> problems.  Testing has occurred by copying full orte/opal branches
> from a trunk checkout into the sparse group tree, so we're confident
> that it's compatible with the trunk.  Full integration will occur
> before commiting to the trunk, of course.
> 
> The proposal we have for the community is as follows:
> 
> 1. Remove the --enable-sparse-groups configure option
> 2. Default to use only dense groups (i.e., same as today)
> 3. If the new MCA parameter "mpi_use_sparse_groups" is enabled, enable
>     the use of sparse groups
> 4. Eliminate the current macro used for group proc lookups and instead
>     use an inline function of the form:
> 
>     static inline ompi_proc_t lookup_group(ompi_group_t *group, int
> index) {
>         if (group_is_dense(group)) {
>             return group->procs[index];
>         } else {
>             return sparse_group_lookup(group, index);
>         }
>     }
> 
>     *** NOTE: This design adds a single "if" in some
>         performance-critical paths.  If the group is sparse, it will
>         add a function call and the overhead to do the lookup.
>         If the group is dense (which will be the default), the proc
>         will be returned directly from the inline function.
> 
>     The rationale is that adding a single "if" (perhaps with
>     OPAL_[UN]LIKELY?) in a few code paths will not be a big deal.
> 
> 5. Bring all these changes into the OMPI trunk and therefore into
>     v1.3.
> 
> Comments?
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to