Ok, with Terry's help, I found a segv in the coll sm. If you run
without the sm btl, there's an obvious bad parameter that we're
passing that results in a segv.
LANL -- can you confirm / deny that these are the segv's that you were
seeing?
While fixing this, I noticed that the sm btl and sm coll are sharing
an mpool when both are running. This probably used to be a good idea
way back when (e.g., when we were using a lot more shmem than we
needed and core counts were lower), but it seems like a bad idea now
(e.g., the btl/sm is fairly specific about the size of the mpool that
is created -- it's just big enough for its data structures).
I'm therefore going to change the mpool string names that btl/sm and
coll/sm are looking for so that they get unique sm mpool modules.
--
Jeff Squyres
jsquy...@cisco.com