Don't these allocations of  bshe->smbhe_keys require some kind of memory
translation from 1 proc's memory space to another ( in bootstrap_init
function /ompi/mca/coll/sm/coll_sm_module.c )
If local rank0 allocates ( get attached to ) memory, others can't read it
without proper tranlsation.
Lenny

On Mon, Aug 10, 2009 at 2:26 PM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:

> We saw these seqv too with and without setting sm btl .
>
> On Fri, Aug 7, 2009 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>>
>>
>> On Thu, Aug 6, 2009 at 3:18 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
>>
>>> Ok, with Terry's help, I found a segv in the coll sm.  If you run without
>>> the sm btl, there's an obvious bad parameter that we're passing that results
>>> in a segv.
>>>
>>> LANL -- can you confirm / deny that these are the segv's that you were
>>> seeing?
>>
>>
>> Yes we can deny that those are the segv's we were seeing - we definitely
>> had the sm btl active. I'll rerun the test on Monday and add the stacktrace
>> to your ticket.
>>
>> Ralph
>>
>>
>>>
>>> While fixing this, I noticed that the sm btl and sm coll are sharing an
>>> mpool when both are running.  This probably used to be a good idea way back
>>> when (e.g., when we were using a lot more shmem than we needed and core
>>> counts were lower), but it seems like a bad idea now (e.g., the btl/sm is
>>> fairly specific about the size of the mpool that is created -- it's just big
>>> enough for its data structures).
>>>
>>> I'm therefore going to change the mpool string names that btl/sm and
>>> coll/sm are looking for so that they get unique sm mpool modules.
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

Reply via email to