> -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > Sent: Tuesday, January 15, 2008 6:13 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] SDP support for OPEN-MPI > > On Jan 13, 2008, at 8:19 AM, Lenny Verkhovsky wrote: > > > > What I meant was try to open an SDP socket. If it fails because SDP > > > is not supported / available to that peer, then open a regular > > > socket. So you should still always have only 1 socket open to a > > peer > > > (not 2). > > Yes, but since the listener side doesn't know on which socket to > > expect > > a message it will need both sockets to be opened. > > > > Ah, you meant the listener socket -- not 2 sockets to each peer. Ok, > fair enough. Opening up one more listener socket in each process is > no big deal (IMO). I thought in a large cluster it can be a problem.
> > > > If one of the machine is not supporting SDP user will get an > > error. > > > > > > Well, that's one way to go, but it's certainly less friendly. It > > > means that the entire MPI job has to support SDP -- including > > mpirun. > > > What about clusters that do not have IB on the head node? > > > > > They can use OOB over IP sockets and BTL on SDP, it should work. > > > > Yes, I'm fine with this -- IIRC, my point was that if SDP is not > available (and the user didn't explicitly ask for it), then it should > not be an error. > > > > >> Perhaps a more general approach would be to [perhaps > > additionally] > > > >> provide an MCA param to allow the user to specify the AF_* value? > > > >> (AF_INET_SDP is a standardized value, right? I.e., will it be > > the > > > >> same on all Linux variants [and someday Solaris]?) > > > > I didn't find any standard on it, it seems to be "randomly" > > selected > > > > since the originally it was 26 and changed to 27 due to conflict > > with > > > > kernel's defines. > > > > > > This might make an even stronger case for having an MCA param for it > > > -- if the AF_INET_SDP value is so broken that it's effectively > > random, > > > it may be necessary to override it on some platforms (especially in > > > light of binary OMPI and OFED distributions that may not match). > > > > > If we talking about passing AF_INET_SDP value only then > > 1. Passing this value as mca parameter will not make any changes to > > the > > SDP code. > > 2. Hopefully in the future AF_INET_SDP value can be gotten from the > > libc, > > And the value will be configured automatically. > > 3. If we are talking about AF_INET value in general ( IPv4, IPv6, SDP) > > Then by making it constant with mca parameter we are limiting > > ourselves > > for one protocol only without being able to failover or using > > different > > protocols for different needs ( i.e. SDP for OOB and IPv4 for BTL ) > > > > I'm not sure what you mean. The AF_INET values for v4 and v6 are > constantly compiled into OMPI via whatever values they are in the > system header files. They're standardized values, right? Yes. > > My understanding of what you were saying was that AF_INET_SDP is *not* > standardized such that it may actually be different values on > different systems. Hence, an MPI app could be otherwise portable but > have a wrong value for AF_INET_SDP compiled into its code. > > Are you saying something else? I thought you were talking about porting general AF_INET value ( IPv4,IPv6,SDP...). I do think that AF_INET_SDP will be standardized once, and will be a constant value in the meanwhile for all systems. Porting AF_INET_SDP will not minimize code changing, but will lower flexibility ( using it for BTL and OOB independently). > > > > >> Patrick's got a good point: is there a reason not to do this? > > > >> (LD_PRELOAD and the like) Is it problematic with the remote > > orted's? > > > > Yes, it's problematic with remote orted's and it not really > > > > transparent > > > > as you might think. > > > > Since we can't pass environments' variables to the orted's during > > > > runtime > > > > > > I think this depends on your environment. If you're not using rsh > > > (which you shouldn't be for a large cluster, which is where SDP > > would > > > matter most, right?), the resource manager typically copies the > > > environment out to the cluster nodes. So an LD_PRELOAD value should > > > be set for the orteds as well. > > > > > > I agree that it's problematic for rsh, but that might also be > > solvable > > > (with some limits; there's only so many characters that we can > > pass on > > > the command line -- we did investigate having a wrapper to the orted > > > at one point to accept environment variables and then launch the > > > orted, but this was so problematic / klunky that we abandoned the > > idea). > > > > > Using LD_PRELOAD will not allow us to use SDP and IP separately, i.e. > > SDP for OOB and IP for a BTL. > > > > Why would you want to do that? I would think that the biggest win > here would be SDP for OOB -- the heck with the BTL. The BTL was just > done for completeness (right?); if you have OpenFabrics support, you > should be using the verbs BTL. > > Perhaps I don't understand exactly what you are proposing. I was > under the impression that you were going after a common case: mpirun > and the MPI jobs are running on back-end compute nodes where all of > them support SDP (although the other case of mpirun running on the > head node without SDP and all the MPI processes are running on back- > end nodes with SDP is also not-uncommon...). Are you thinking of > something else, or are you looking for more flexibility? I am just looking for more flexibility for the end user. > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel