> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On
> Behalf Of Jeff Squyres
> Sent: Tuesday, January 08, 2008 4:32 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] SDP support for OPEN-MPI
> 
> On Jan 8, 2008, at 7:45 AM, Lenny Verkhovsky wrote:
> 
> >> Hence, if HAVE_DECL_AF_INET_SDP==1 and using AF_INET_SDP fails to
> >> that
> >> peer, it might be desirable to try to fail over to using
> >> AF_INET_something_else.  I'm still technically on vacation :-), so
I
> >> didn't look *too* closely at your patch, but I think you're doing
> >> that
> >> (failing over if AF_INET_SDP doesn't work because of EAFNOSUPPORT),
> >> which is good.
> > This is actually not implemented yet.
> > Supporting failing over requires opening AF_INET sockets in addition
> > to
> > SDP sockets, this can cause a problem in large clusters.
> 
> What I meant was try to open an SDP socket.  If it fails because SDP
> is not supported / available to that peer, then open a regular
> socket.  So you should still always have only 1 socket open to a peer
> (not 2).
Yes, but since the listener side doesn't know on which socket to expect
a message it will need both sockets to be opened.

> 
> > If one of the machine is not supporting SDP user will get an error.
> 
> Well, that's one way to go, but it's certainly less friendly.  It
> means that the entire MPI job has to support SDP -- including mpirun.
> What about clusters that do not have IB on the head node?
>
They can use OOB over IP sockets and BTL on SDP, it should work.

> >> Perhaps a more general approach would be to [perhaps additionally]
> >> provide an MCA param to allow the user to specify the AF_* value?
> >> (AF_INET_SDP is a standardized value, right?  I.e., will it be the
> >> same on all Linux variants [and someday Solaris]?)
> > I didn't find any standard on it, it seems to be "randomly" selected
> > since the originally it was 26 and changed to 27 due to conflict
with
> > kernel's defines.
> 
> This might make an even stronger case for having an MCA param for it
> -- if the AF_INET_SDP value is so broken that it's effectively random,
> it may be necessary to override it on some platforms (especially in
> light of binary OMPI and OFED distributions that may not match).
> 
If we talking about passing AF_INET_SDP value only then 
1. Passing this value as mca parameter will not make any changes to the
SDP code.
2. Hopefully in the future AF_INET_SDP value can be gotten from the
libc, 
And the value will be configured automatically. 
3. If we are talking about AF_INET value in general ( IPv4, IPv6, SDP)
Then by making it constant with mca parameter we are limiting ourselves
for one protocol only without being able to failover or using different
protocols for different needs ( i.e. SDP for OOB and IPv4 for BTL )


> >> Patrick's got a good point: is there a reason not to do this?
> >> (LD_PRELOAD and the like)  Is it problematic with the remote
orted's?
> > Yes, it's problematic with remote orted's and it not really
> > transparent
> > as you might think.
> > Since we can't pass environments' variables to the orted's during
> > runtime
> 
> I think this depends on your environment.  If you're not using rsh
> (which you shouldn't be for a large cluster, which is where SDP would
> matter most, right?), the resource manager typically copies the
> environment out to the cluster nodes.  So an LD_PRELOAD value should
> be set for the orteds as well.
> 
> I agree that it's problematic for rsh, but that might also be solvable
> (with some limits; there's only so many characters that we can pass on
> the command line -- we did investigate having a wrapper to the orted
> at one point to accept environment variables and then launch the
> orted, but this was so problematic / klunky that we abandoned the
idea).
> 
Using LD_PRELOAD will not allow us to use SDP and IP separately, i.e.
SDP for OOB and IP for a BTL.


> > we must preload sdp library to each remote environment ( i.e.
> > bashrc ) This will cause all applications to use SDP instead of
> > AF_INET.
> > Which means you can't choose specific protocol for specific
> > application,
> > either you are using SDP or AF_INET for all.
> > SDP also can be loaded with appropriate /usr/local/ofed/etc/
> > libsdp.conf
> > configuration but a simple user have no access to it usually.
> >
(http://www.cisco.com/univercd/cc/td/doc/product/svbu/ofed/ofed_1_1/ofed
> > _ug/sdp.htm#wp952927)
> >
> >> Andrew's got a point point here, too -- accelerating the TCP BTL
with
> >> SDP seems kinda pointless.  I'm guessing that you did it because it
> >> was just about the same work as was done in the TCP OOB (for which
we
> >> have no corresponding verbs interface).  Is that right?
> > Indeed. But it also seems that SDP has lower overhead than VERBS in
> > some
> > cases.
> 
> Are you referring to the fact that the avail(%) column is lower for
> verbs than SDP/IPoIB?  That seems like a pretty weird metric for such
> small message counts.  What exactly does 77.5% of 0 bytes mean?


> 
> My $0.02 is that the other columns are more compelling.  :-)
> 
> > Tests with Sandia's overlapping benchmark
> > http://www.cs.sandia.gov/smb/overhead.html#mozTocId316713
> >
> > VERBS results
> > msgsize iterations  iter_t      work_t      overhead    base_t
> > avail(%)
> > 0       1000        16.892      15.309      1.583       7.029
> > 77.5
> > 2       1000        16.852      15.332      1.520       7.144
> > 78.7
> > 4       1000        16.932      15.312      1.620       7.128
> > 77.3
> > 8       1000        16.985      15.319      1.666       7.182
> > 76.8
> > 16      1000        16.886      15.297      1.589       7.219
> > 78.0
> > 32      1000        16.988      15.311      1.677       7.251
> > 76.9
> > 64      1000        16.944      15.299      1.645       7.457
> > 77.9
> >
> > SDP results
> > 0       1000        134.902     128.089     6.813       54.691
> > 87.5
> > 2       1000        135.064     128.196     6.868       55.283
> > 87.6
> > 4       1000        135.031     128.356     6.675       55.039
> > 87.9
> > 8       1000        130.460     125.908     4.552       52.010
> > 91.2
> > 16      1000        135.432     128.694     6.738       55.615
> > 87.9
> > 32      1000        135.228     128.494     6.734       55.627
> > 87.9
> > 64      1000        135.470     128.540     6.930       56.583
> > 87.8
> >
> > IPoIB results
> > 0       1000        252.953     247.053     5.900       119.977
> > 95.1
> > 2       1000        253.336     247.285     6.051       121.573
> > 95.0
> > 4       1000        254.147     247.041     7.106       122.110
> > 94.2
> > 8       1000        254.613     248.011     6.602       121.840
> > 94.6
> > 16      1000        255.662     247.952     7.710       124.738
> > 93.8
> > 32      1000        255.569     248.057     7.512       127.095
> > 94.1
> > 64      1000        255.867     248.308     7.559       132.858
> > 94.3
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to