Thank you very much, Gilles. That is exactly the information I was looking
for.

Best regards
Durga

We learn from history that we never learn from history.

On Fri, Apr 8, 2016 at 12:52 AM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> At init time, each task invoke btl_openib_component_init() which invokes
> btl_openib_modex_send()
> basically, it collects infiniband info (port, subnet, lid, ...) and "push"
> them to orted via the modex mechanism.
>
> When a communication is created, the remote information is retrieved via
> the modex mechanism in mca_btl_openib_proc_get_locket()
>
> Cheers,
>
> Gilles
>
>
> On 4/8/2016 1:30 PM, dpchoudh . wrote:
>
> Hi Gilles
>
> Thanks for responding quickly; however, I am afraid I did not explain my
> question clearly enough; my apologies.
>
> What I am trying to understand is this:
>
> My cluster has (say) 7 nodes. I use IP-over-Ethernet for Orted (for job
> launch and control traffic); this is not used for MPI messaging. Let's say
> that the IP addresses are 192.168.1.2-192.168.1.9. They are all in the same
> IP subnet.
>
> The MPI messaging is used using some other interconnects, such as
> Infiniband. All 7 nodes are connected to the same Infiniband switch and
> hence are in the same (infiniband) subnet as well.
>
> In my host file, I mention (say) 4 IP addresses:  192.168.3-192.168.1.7
>
> My question is, how does OpenMPI pick the 4 Infiniband interfaces that
> matches the IP addresses? Put another way, the ranks of each launched jobs
> are (I presume) setup by orted by some mechanism. When I do an MPI_Send()
> to a given rank, the message goes to the Infiniband interface with a
> particular LID. How does this IP-to-Infiniband LID mapping happen?
>
> Thanks
> Durga
>
> We learn from history that we never learn from history.
>
> On Fri, Apr 8, 2016 at 12:12 AM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
>
>> Hi,
>>
>> the hostnames (or their IPs) are only used to ssh orted.
>>
>>
>> if you use only the tcp btl :
>>
>> TCP *MPI* communications (vs OOB management communications) are handled
>> by btl/tcp
>> by default, all usable interfaces are used, then messages are split
>> (iirc, by ob1 pml) and then "fragments"
>> are sent using all interfaces.
>>
>> each interface has a latency and bandwidth that is used to split message
>> into fragments.
>> (assuming it is correctly configured, 90% of a large message is sent over
>> the 10GbE interface, and 10% is sent over the GbE interface)
>>
>> if you can explicitly list/blacklist interface
>> mpirun --mca btl_tcp_if_include ...
>> or
>> mpirun --mca btl_tcp_if_exclude ...
>>
>> (see ompi_info --all for the syntax)
>>
>>
>> but if you use several btls (for example tcp and openib), the btl(s) with
>> the lower exclusivity are not used.
>> (for example, a large message is *not* split and send using native ib,
>> IPoIB and GbE because the openib btl
>> has a higher exclusivity than the tcp btl)
>>
>>
>> did this answer your question ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>> On 4/8/2016 12:24 PM, dpchoudh . wrote:
>>
>> Hello all
>>
>> (Newbie warning! Sorry :-(  )
>>
>> Let's say my cluster has 7 nodes, connected via IP-over-Ethernet for
>> control traffic and some kind of raw verbs (or anything else such as SRIO)
>> interface for data transfer. Let's say my host file chooses 4 out of the 7
>> nodes for an MPI job, based on the IP address, which are assigned to the
>> Ethernet interfaces.
>>
>> My question is: where in the code does this mapping between
>> IP-to-whatever_interface_is_used_for_MPI_Send/Recv is determined, such as
>> only those chosen nodes receive traffic over the verbs interface?
>>
>> Thanks in advance
>> Durga
>>
>> We learn from history that we never learn from history.
>>
>>
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/04/18746.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/04/18747.php
>>
>
>
>
> _______________________________________________
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/04/18748.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/04/18749.php
>

Reply via email to