Don't these allocations of bshe->smbhe_keys require some kind of memory translation from 1 proc's memory space to another ( in bootstrap_init function /ompi/mca/coll/sm/coll_sm_module.c ) If local rank0 allocates ( get attached to ) memory, others can't read it without proper tranlsation. Lenny
On Mon, Aug 10, 2009 at 2:26 PM, Lenny Verkhovsky < lenny.verkhov...@gmail.com> wrote: > We saw these seqv too with and without setting sm btl . > > On Fri, Aug 7, 2009 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> >> On Thu, Aug 6, 2009 at 3:18 PM, Jeff Squyres <jsquy...@cisco.com> wrote: >> >>> Ok, with Terry's help, I found a segv in the coll sm. If you run without >>> the sm btl, there's an obvious bad parameter that we're passing that results >>> in a segv. >>> >>> LANL -- can you confirm / deny that these are the segv's that you were >>> seeing? >> >> >> Yes we can deny that those are the segv's we were seeing - we definitely >> had the sm btl active. I'll rerun the test on Monday and add the stacktrace >> to your ticket. >> >> Ralph >> >> >>> >>> While fixing this, I noticed that the sm btl and sm coll are sharing an >>> mpool when both are running. This probably used to be a good idea way back >>> when (e.g., when we were using a lot more shmem than we needed and core >>> counts were lower), but it seems like a bad idea now (e.g., the btl/sm is >>> fairly specific about the size of the mpool that is created -- it's just big >>> enough for its data structures). >>> >>> I'm therefore going to change the mpool string names that btl/sm and >>> coll/sm are looking for so that they get unique sm mpool modules. >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >