On Mar 5, 2010, at 12:55 PM, Jeff Squyres wrote:

> On Mar 5, 2010, at 2:34 PM, George Bosilca wrote:
> 
>> Being user friendly is good, being way too user friendly is less (but I 
>> guess this is the price we have to pay for a production-quality code isn't 
>> it).
> 
> Agreed.  None of these messages appear except in error cases or if you crank 
> up the verbosity.  The use case for this was a user (more than one, actually) 
> who had problems with the TCP BTL deciding not to connect to peers for some 
> reason.  But there was no way to know exactly what the BTL was *trying* to do 
> -- all you got was (effectively), "Sorry, I couldn't connect."  So the main 
> impetus for this was to give some visibility into what the TCP BTL is doing 
> when it tries to connect -- you can see if it's trying to use private IP 
> addresses by mistake, or somesuch.
> 
>> I have few comments:
>> 
>> - In several places you replaced the BTL_ERROR (which was the way BTLs are 
>> supposed to complaints) by a call directly to orte_show_help. This presents 
>> several inconveniences: drifting away from something more or less consistent 
>> across all BTLs, adding more dependencies between the BTLs and ORTE.
> 
> I have never found BTL_ERROR to be terribly helpful.  All it is is 
> essentially an fprintf -- it doesn't propagate errors upward or anything.  I 
> tend to prefer show_help because then you can provide a meaningful error 
> message that way -- and duplicate messages are not displayed (which many 
> people have told me that they love that feature).  BTL_ERROR just guarantees 
> that the user will have to email us to figure out what's going on because the 
> messages aren't meaningful to anyone other than an OMPI developer.

I'm not sure I understand this concern either, especially the latter one about 
orte dependency. There already are 5 calls to orte_show_help in this btl, along 
with several references to orte_process_info and other orte elements. What harm 
is done by adding more calls to orte_show_help?

I better understand the BTL_ERROR issue, but it raises the question as to 
whether BTL_ERROR should be defined as an orte_show_help call. That might help 
reduce the flood of duplicate messages when an error occurs.

> 
>> - There are a lot of places where you just indented the code or split a 
>> medium-sized line into several lines. I find the code more difficult to read.
> 
> Ja; I did re-intent some code because I found it hard to read the super-long 
> lines while trying to figure out the TCP BTL code.  Sorry about that.  
> 
> You do the same thing sometimes, too.  ;-)
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to