I'm not sure the two questions in your second item are separable, Eugene. I
fear that the only real solution will be to rearch the sm BTL, which was
originally a flawed design. I think you did a great job of building on it,
but we are now finding that the foundation is just too shaky, so no matter
what we do to patch it, it will still fail.

Not putting words in Brian's mouth, but I believe this is what he is trying
to gently communicate.


On Wed, Jun 24, 2009 at 8:38 AM, Eugene Loh <eugene....@sun.com> wrote:

> Brian Barrett wrote:
>
>  Or go to what I proposed and USE A LINKED LIST!  (as I said before,  not
>> an original idea, but one I think has merit)  Then you don't have  to size
>> the fifo, because there isn't a fifo.  Limit the number of  send fragments
>> any one proc can allocate and the only place memory can  grow without bound
>> is the OB1 unexpected list.  Then use SEND_COMPLETE  instead of SEND_NORMAL
>> in the collectives without barrier semantics  (bcast, reduce, gather,
>> scatter) and you effectively limit how far  ahead any one proc can get to
>> something that we can handle, with no  performance hit.
>>
>
> I'm still digesting George's mail and trac comments and responses thereto.
>  Meanwhile, a couple of questions here.
>
> First, I think it'd be helpful if you said a few words about how you think
> a linked list should be used here.  I can think of a couple of different
> ways, and I have questions about each way.  Instead of my enumerating these
> ways and those questions, how about you just be more specific?  (We used to
> grow the FIFOs, so sizing them didn't used to be an issue.)
>
> Second, I'm curious how elaborate of a fix I should be trying for here.
>  Are we looking for something to fix the problems at hand, or are we opening
> the door to rearchitecting a big part of the sm BTL?
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to