On Dec 15, 2013, at 15:40 , Ralph Castain <r...@open-mpi.org> wrote:

> Not true, George - look more closely at the code. We only retrieve the 
> hostname if the number of procs is low. Otherwise, we do *not* retrieve it 
> until we do a modex_recv, and thus the debug is now broken at scale. This was 
> required for scalable launch, which is something I know is important to you 
> as well.

Sure, if you trust the comment in the file. Unfortunately the comment is wrong, 
nobody is setting the hostnam of prods we’re talking about.

Moreover the real meaning of the cutoff parameters is clearly defined in the 
snippet below:

> r29052 [[BR]]
> As per the email discussion, revise the sparse handling of hostnames so
> that we avoid potential infinite loops while allowing large-scale users to
> improve their startup time:
> 
> * add a new MCA param orte_hostname_cutoff to specify the number of nodes
> at which we stop including hostnames. This defaults to INT_MAX => always
> include hostnames. If a value is given, then we will include hostnames for
> any allocation smaller than the given limit.
> 
> * remove ompi_proc_get_hostname. Replace all occurrences with a direct
> link to ompi_proc_t’s proc_hostname, protected by appropriate "if NULL"

The comment above is about scalability.

> Modifying the API isn't a big deal, so why the fuss? Let's just change it and 
> get the debug working again.


Here is how I see the thing. I made a change to remove a deadlock and maintain 
the scalability of the codebase, a change that does not affect the normal use 
of the OMPI debug facility for most of the users. From here on, feel free to 
improve on the existing code as much as you feel necessary as long as you 
maintain the above properties. Enough has been said about this topic, I will 
now pursue my other interests.

Thanks,
  George.

Reply via email to