Ok, with Terry's help, I found a segv in the coll sm. If you run without the sm btl, there's an obvious bad parameter that we're passing that results in a segv.

LANL -- can you confirm / deny that these are the segv's that you were seeing?

While fixing this, I noticed that the sm btl and sm coll are sharing an mpool when both are running. This probably used to be a good idea way back when (e.g., when we were using a lot more shmem than we needed and core counts were lower), but it seems like a bad idea now (e.g., the btl/sm is fairly specific about the size of the mpool that is created -- it's just big enough for its data structures).

I'm therefore going to change the mpool string names that btl/sm and coll/sm are looking for so that they get unique sm mpool modules.

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to