I really don't like our show_help at every level behavior (look at what
happens when MPI_INIT fails, you get a page per process of the same error
message from each level of the call stack).  If you want to show_help and
abort on debug, that makes sense.  It doesn't make any sense on a
production build.  Return an error code and let the upper layer deal with
it.

Brian

On 11/2/11 11:27 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

>Brian: you were the one that had an allergic reaction to #1 on the call.
>
>Thoughts?
>
>
>On Nov 2, 2011, at 1:23 PM, George Bosilca wrote:
>
>> As it has been said, this is not something supposed to make it in a
>>release. On the unfortunate case where it does, always having a
>>show_help will ensure a quick complaint on one of our mailing lists and
>>increase the probability of a [very] quick fix.
>> 
>>   george.
>> 
>> On Nov 2, 2011, at 06:26 , TERRY DONTJE wrote:
>> 
>>> 
>>> 
>>> On 11/1/2011 7:48 PM, Jeff Squyres wrote:
>>>> So this was slightly different than the opinion that was discussed on
>>>>the call today, which was 2.  The rationale for #2 was to punish
>>>>developers, but if such a bug did make it through to production, users
>>>>wouldn't be annoyed with show_help messages all the time.
>>>> 
>>>> Does anyone have strong opinions here?  I don't.
>>>> 
>>>> I offer the following two points:
>>>> 
>>>> - this is a coding error on the OMPI developer
>>>> - it's pretty rare
>>>> 
>>>> 
>>> I think a show_help + return is very helpful in this case.  I wouldn't
>>>think that we'd run into this case that much and it would seem that it
>>>would be a rare occurance that one could just fix when they run into
>>>it.  However, since there was some opposition to having show_help
>>>messages possibly coming up all over the place I     thought a fall
>>>back of only doing the show_help on enable_debug builds was a
>>>reasonable middle ground.
>>> 
>>> --td
>>>> On Nov 1, 2011, at 7:30 PM, George Bosilca wrote:
>>>> 
>>>> 
>>>>> 1
>>>>> 
>>>>>  george.
>>>>> 
>>>>> On Nov 1, 2011, at 17:23 , Jeff Squyres wrote:
>>>>> 
>>>>> 
>>>>>> Can you clarify -- I can parse your text multiple ways.  Which are
>>>>>>you voting for?
>>>>>> 
>>>>>> 1. show_help + return error code in all cases.
>>>>>> 2. if OPAL_ENABLE_DEBUG, show_help + exit(1), else silently return
>>>>>>error code.
>>>>>> 3. show_help.  if OPAL_ENABLE_DEBUG, exit(1), else return error
>>>>>>code.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Nov 1, 2011, at 4:50 PM, George Bosilca wrote:
>>>>>> 
>>>>>> 
>>>>>>> This is a much saner solution. We [mostly] stayed away from
>>>>>>>calling exit deep into our libraries, there is no reason to add it
>>>>>>>now. I'll vote in favor of show_help + return code.
>>>>>>> 
>>>>>>> george.
>>>>>>> 
>>>>>>> On Nov 1, 2011, at 15:14 , Jeff Squyres wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> We talked about this on the call today.
>>>>>>>> 
>>>>>>>> A good suggestion was made: call show_help/opal_finalize/exit
>>>>>>>>only when OPAL_ENABLE_DEBUG is true.  Otherwise, return an error
>>>>>>>>code.
>>>>>>>> 
>>>>>>>> If no one objects to this, I'll commit this tomorrow.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 31, 2011, at 4:16 PM, Jeff Squyres wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> WHAT: what to do if registering an MCA param results in an error?
>>>>>>>>> 
>>>>>>>>> WHERE: opal/mca/base/mca_base_param.c
>>>>>>>>> 
>>>>>>>>> WHY: MCA param re-registration issues should be treated as OMPI
>>>>>>>>>developer errors
>>>>>>>>> 
>>>>>>>>> WHEN: COB Friday, 4 Nov 2011
>>>>>>>>> 
>>>>>>>>> -----------------
>>>>>>>>> 
>>>>>>>>> Short version:
>>>>>>>>> 
>>>>>>>>> Re-registering an MCA param to be a different type (e.g., it was
>>>>>>>>>initially registered to be a string, but was later re-registered
>>>>>>>>>to be an int) should be treated as an OMPI developer error, and
>>>>>>>>>should opal_finalize()/exit(1).
>>>>>>>>> 
>>>>>>>>> More details:
>>>>>>>>> 
>>>>>>>>> A mistaken MCA param re-registration recently caused an orted
>>>>>>>>>segv.
>>>>>>>>> 
>>>>>>>>> The MCA param subsystem was fixed to avoid this segv, but
>>>>>>>>>silently convert the MCA param to the newly-registered type.
>>>>>>>>>Upon reflection and some discussion, this seems to be a bad idea.
>>>>>>>>> Instead, we should loudly complain via a show_help message and
>>>>>>>>>then exit(1).
>>>>>>>>> 
>>>>>>>>> Specifically: this kind of behavior is clearly an error and
>>>>>>>>>should be fixed.  Unfortunately, in most cases, we don't actually
>>>>>>>>>check the return value from MCA param registration functions, so
>>>>>>>>>if we change the MCA param function to simply return a non
>>>>>>>>>OPAL_SUCCESS status, it's unlikely that anyone will notice until
>>>>>>>>>some code tries to read the param value, likely still resulting
>>>>>>>>>in a segv.
>>>>>>>>> 
>>>>>>>>> Does anyone have heartburn if I change the error behavior to
>>>>>>>>>opal_finalize()/exit(1)?
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Jeff Squyres
>>>>>>>>> 
>>>>>>>>> jsquy...@cisco.com
>>>>>>>>> 
>>>>>>>>> For corporate legal information go to:
>>>>>>>>> 
>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> 
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> -- 
>>>>>>>> Jeff Squyres
>>>>>>>> 
>>>>>>>> jsquy...@cisco.com
>>>>>>>> 
>>>>>>>> For corporate legal information go to:
>>>>>>>> 
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> 
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> 
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> 
>>>>>> jsquy...@cisco.com
>>>>>> 
>>>>>> For corporate legal information go to:
>>>>>> 
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> 
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> 
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> -- 
>>> <Mail Attachment.gif>
>>> Terry D. Dontje | Principal Software Engineer
>>> Developer Tools Engineering | +1.781.442.2631
>>> Oracle - Performance Technologies
>>> 95 Network Drive, Burlington, MA 01803
>>> Email terry.don...@oracle.com
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>


-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories






Reply via email to