Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

Jeff Squyres (jsquyres) Tue, 5 Nov 2013 21:06:16 -0500 (EST)

On Nov 5, 2013, at 2:59 PM, George Bosilca <bosi...@icl.utk.edu> wrote:


> I have a question regarding the extension of this concept to multi-BTL
> runs. Granted we will have to have a local indexing of BTL (I'm not
> concerned about this). But how do we ensure the naming is globally
> consistent (in the sense that all processes in the job will agree that
> usnic0 is index 0) even when we have a heterogeneous environment?

The MPI_T pvars are local-only.  So even if index 0 is usnic_0 in proc A, but 
index 0 is usnic_3 in proc B, it shouldn't matter.  More specifically: these 
values only have meaning within the process from which they were gathered.

I guess I'm trying to say that there's no need to ensure globally consistent 
ordering between processes.  ...unless I'm missing something?

> As
> an example some of our clusters have 1 NIC on some nodes, and 2 on
> others. Of course we can say we don't guarantee consistent naming, but
> for tools trying to understand communication issues on distributed
> environments having a global view is a clear plus.

A good point.  But even with globally consistent ordering, you don't know that 
usnic_0 in process A communicates with usnic_0 in process B (indeed, we run 
some QA cases here at Cisco where we deliberately ensure that usnic_X in 
process A is on the same subnet as usnic_Y in process B, where X!=Y, and 
everything still works properly).

> Another question is about the level of details. I wonder if this level
> of details is really needed, or providing the aggregate pvar will be
> enough in most cases. The problem I see here is the lack of
> topological knowledge at the upper level. Seeing a large number of
> messages on a particular BTL might suggest that something is wrong
> inside the implementation, when in fact the BTL is the only one
> connecting a subset of peers. Without us exposing this information,
> I'm afraid the tool might get the wrong picture ...

I think exposing network-level information can only be used to infer indirect 
information about the upper-layer MPI semantics.  However, exposing these 
counters was not intended to be used for MPI-application-level semantic 
information; it was more intended to expose information about what is happening 
on your underlying network -- something that OS bypass networks don't otherwise 
provide.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

Reply via email to