I Updated the RFC.. > From: Jeff Squyres <jsquy...@cisco.com> > Date: July 25, 2007 9:04:44 AM EDT > To: Open Developers <de...@open-mpi.org> > Subject: [OMPI devel] [RFC] Sparse group implementation > Reply-To: Open MPI Developers <de...@open-mpi.org> > > WHAT: Merge the sparse groups work to the trunk; get the > community's > opinion on one remaining issue. > WHY: For large MPI jobs, it can be memory-prohibitive to fully > represent dense groups; you can save a lot of space by > having > "sparse" representations of groups that are (for example) > derived from MPI_COMM_WORLD. > WHERE: Main changes are (might have missed a few in this analysis, > but this is 99% of it): > - Big changes in ompi/group > - Moderate changes in ompi/comm > - Trivial changes in ompi/mpi/c, ompi/mca/pml/[dr|ob1], > ompi/mca/comm/sm > WHEN: The code is ready now in /tmp/sparse (it is passing > all Intel and IBM tests; see below). > TIMEOUT: We'll merge all the work to the trunk and enable the > possibility of using sparse groups (dense will still be the > default, of course) if no one objects by COB Tuesday, 31 Aug > 2007. > > ====================================================================== > == > === > > The sparse groups work from U. Houston is ready to be brought into the
> trunk. It is built on the premise that for very large MPI jobs, you > don't want to fully represent MPI groups in memory if you don't have > to. Specifically, you can save memory for communicators/groups that > are derived from MPI_COMM_WORLD by representing them in a sparse > storage format. > > The sparse groups work introduces 3 new ompi_group_t storage formats: > > * dense (i.e., what it is today -- an array of ompi_proc_t pointers) > * sparse, where the current group's contents are based on the group > from which the child was derived: > 1. range: a series of (offset,length) tuples > 2. stride: a single (first,stride,last) tuple > 3. bitmap: a bitmap > > Currently, all the sparse groups code must be enabled by configuring > with --enable-sparse-groups. If sparse groups are enabled, each MPI > group that is created will automatically use the storage format that > takes the least amount of space. > > The Big Issue with the sparse groups is that getting a pointer to an > ompi_proc_t may no longer be an O(1) operation -- you can't just > access it via comm->group->procs[i]. Instead, you have to call a > macro. If sparse groups are enabled, this will call a function to do > the resolution and return the proc pointer. If sparse groups are not > enabled, the macro currently resolves to group->procs[i]. Actually there is no macro anymore. Brian Suggested that we make it and inline function (ompi_group_peer_lookup) that checks if sparse groups are enabled (#if OMPI_GROUP_SPARSE) and acts accrodingly.. > > When sparse groups are enabled, looking up a proc pointer is an > iterative process; you have to traverse up through one or more parent > groups until you reach a "dense" group to get the pointer. So the > time to lookup the proc pointer (essentially) depends on the group and > how many times it has been derived from a parent group (there are > corner cases where the lookup time is shorter). Lookup times in > MPI_COMM_WORLD are O(1) because it is dense, but it now requires an > inline function call rather than directly accessing the data structure > (see below). > > Note that the code in /tmp/sparse-groups is currently out-of-date with > respect to the orte and opal trees due to SVN merge mistakes and > problems. Testing has occurred by copying full orte/opal branches > from a trunk checkout into the sparse group tree, so we're confident > that it's compatible with the trunk. Full integration will occur > before commiting to the trunk, of course. A new branch has been created in /tmp/sparse that works perfect.. > > The proposal we have for the community is as follows: > > 1. Remove the --enable-sparse-groups configure option 2. Default to > use only dense groups (i.e., same as today) 3. If the new MCA > parameter "mpi_use_sparse_groups" is enabled, enable > the use of sparse groups The configure option will be kept. we will also have a runtime option (mpi_use_sparse_groups) that is set by default when the sparse groups are enabled on configure. > 4. Eliminate the current macro used for group proc lookups and instead > use an inline function of the form: > > static inline ompi_proc_t lookup_group(ompi_group_t *group, int > index) { > if (group_is_dense(group)) { > return group->procs[index]; > } else { > return sparse_group_lookup(group, index); > } > } > Done, however the inline functions uses #if instead of if().. > *** NOTE: This design adds a single "if" in some > performance-critical paths. If the group is sparse, it will > add a function call and the overhead to do the lookup. > If the group is dense (which will be the default), the proc > will be returned directly from the inline function. > > The rationale is that adding a single "if" (perhaps with > OPAL_[UN]LIKELY?) in a few code paths will not be a big deal. > Another proposition that i mentioned before is to keep the sparse parameters in the group structure (not compile them out) when the sparse groups are disabled, which will remove almost all #ifs from the code, which will be much easier for the eyes (the main reason).. Brian had some objections.. Again the extra parameters will be 5 integers and 3 pointers. > 5. Bring all these changes into the OMPI trunk and therefore into > v1.3. > > Comments? Anyone? > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems -- Mohamad Chaarawi Instructional Assistant http://www.cs.uh.edu/~mschaara Department of Computer Science University of Houston 4800 Calhoun, PGH Room 526 Houston, TX 77204, USA