Thank you very much, Gilles. That is exactly the information I was looking for.
Best regards Durga We learn from history that we never learn from history. On Fri, Apr 8, 2016 at 12:52 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > At init time, each task invoke btl_openib_component_init() which invokes > btl_openib_modex_send() > basically, it collects infiniband info (port, subnet, lid, ...) and "push" > them to orted via the modex mechanism. > > When a communication is created, the remote information is retrieved via > the modex mechanism in mca_btl_openib_proc_get_locket() > > Cheers, > > Gilles > > > On 4/8/2016 1:30 PM, dpchoudh . wrote: > > Hi Gilles > > Thanks for responding quickly; however, I am afraid I did not explain my > question clearly enough; my apologies. > > What I am trying to understand is this: > > My cluster has (say) 7 nodes. I use IP-over-Ethernet for Orted (for job > launch and control traffic); this is not used for MPI messaging. Let's say > that the IP addresses are 192.168.1.2-192.168.1.9. They are all in the same > IP subnet. > > The MPI messaging is used using some other interconnects, such as > Infiniband. All 7 nodes are connected to the same Infiniband switch and > hence are in the same (infiniband) subnet as well. > > In my host file, I mention (say) 4 IP addresses: 192.168.3-192.168.1.7 > > My question is, how does OpenMPI pick the 4 Infiniband interfaces that > matches the IP addresses? Put another way, the ranks of each launched jobs > are (I presume) setup by orted by some mechanism. When I do an MPI_Send() > to a given rank, the message goes to the Infiniband interface with a > particular LID. How does this IP-to-Infiniband LID mapping happen? > > Thanks > Durga > > We learn from history that we never learn from history. > > On Fri, Apr 8, 2016 at 12:12 AM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> Hi, >> >> the hostnames (or their IPs) are only used to ssh orted. >> >> >> if you use only the tcp btl : >> >> TCP *MPI* communications (vs OOB management communications) are handled >> by btl/tcp >> by default, all usable interfaces are used, then messages are split >> (iirc, by ob1 pml) and then "fragments" >> are sent using all interfaces. >> >> each interface has a latency and bandwidth that is used to split message >> into fragments. >> (assuming it is correctly configured, 90% of a large message is sent over >> the 10GbE interface, and 10% is sent over the GbE interface) >> >> if you can explicitly list/blacklist interface >> mpirun --mca btl_tcp_if_include ... >> or >> mpirun --mca btl_tcp_if_exclude ... >> >> (see ompi_info --all for the syntax) >> >> >> but if you use several btls (for example tcp and openib), the btl(s) with >> the lower exclusivity are not used. >> (for example, a large message is *not* split and send using native ib, >> IPoIB and GbE because the openib btl >> has a higher exclusivity than the tcp btl) >> >> >> did this answer your question ? >> >> Cheers, >> >> Gilles >> >> >> >> On 4/8/2016 12:24 PM, dpchoudh . wrote: >> >> Hello all >> >> (Newbie warning! Sorry :-( ) >> >> Let's say my cluster has 7 nodes, connected via IP-over-Ethernet for >> control traffic and some kind of raw verbs (or anything else such as SRIO) >> interface for data transfer. Let's say my host file chooses 4 out of the 7 >> nodes for an MPI job, based on the IP address, which are assigned to the >> Ethernet interfaces. >> >> My question is: where in the code does this mapping between >> IP-to-whatever_interface_is_used_for_MPI_Send/Recv is determined, such as >> only those chosen nodes receive traffic over the verbs interface? >> >> Thanks in advance >> Durga >> >> We learn from history that we never learn from history. >> >> >> _______________________________________________ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/04/18746.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/04/18747.php >> > > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/04/18748.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/04/18749.php >