Re: [OMPI devel] Device failover on ob1

Brian Barrett Sun, 2 Aug 2009 00:55:10 -0400

While I agree that performance impact (latency in this case) isimportant, I disagree that this necessarily belongs somewhere otherthan ob1. For example, a zero-performance impact solution would be toprovide two versions of all the interface functions, one with failoverturned on and one with it turned off, and select the appropriatefunctions at initialization time. There are others, including carefulplacement of decision logic, which are likely to result in near-zeroimpact. I'm not attempting to prescribe a solution, but refuting theclaim that this can't be in ob1 - I think more data is needed beforesuch a claim is made.

Mouhamed - can the openib btl try to re-establish a connection betweentwo peers today (with your ob1 patches, obviously)? Would this allowus to adapt to changing routes due to switch failures (assuming thatthere are other physical routes around the failed switch, of course)?


Thanks,

Brian

On Aug 1, 2009, at 6:21 PM, Graham, Richard L. wrote:

What is the impact on sm, which is by far the most sensitive tolatency. This really belongs in a place other than ob1. Ob1 issupposed to provide the lowest latency possible, and other pml's aresupposed to be used for heavier weight protocols.
On the technical side, how do you distinguish between a lotacknowledgement and an undelivered message ? You really don't wantto try and deliver data into user space twice, as once a receive iscomplete, who knows what the user has done with that buffer ? Ageneral treatment needs to be able to false negatives, and attemptsto deliver the data more than once.
How are you detecting missing acknowledgements ? Are you using somesort of timer ?
Rich

On 7/31/09 5:49 AM, "Mouhamed Gueye" <[email protected]> wrote:

Hi list,

Here is an update on our work concerning device failover.

As many of you suggested, we reoriented our work on ob1 rather than dr
and we now have a working prototype on top of ob1. The approach is to
store btl descriptors sent to peers and delete them when we receive
proof of delivery. So far, we rely on completion callback functions,
assuming that the message is delivered when the completion function is
called, that is the case of openib. When a btl module fails, it is
removed from the endpoint's btl list and the next one is used to
retransmit stored descriptors. No extra-message is transmitted, itonlyconsists in additions to the header. It has been mainly tested withtwoIB modules, in both multi-rail (two separate networks) and multi-path (a
big unique network).

You can grab and test the patch here (applies on top of the trunk) :
http://bitbucket.org/gueyem/ob1-failover/

To compile with failover support, just define --enable-device-failover
at configure. You can then run a benchmark, disconnect a port and see
the failover operate.

A little latency increase (~ 2%) is induced by the failover layer when
no failover occurs. To accelerate the failover process on openib, you
can try to lower the btl_openib_ib_timeout openib parameter to 15 for
example instead of 20 (default value).

Mouhamed
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] Device failover on ob1

Reply via email to