On Nov 1, 2011, at 5:48 PM, Jeff Squyres wrote:

> So this was slightly different than the opinion that was discussed on the 
> call today, which was 2.  The rationale for #2 was to punish developers, but 
> if such a bug did make it through to production, users wouldn't be annoyed 
> with show_help messages all the time.
> 
> Does anyone have strong opinions here?  I don't.

Not strong opinions, no. Just noting that unless someone goes back and adds 
return status checks on every param registration (there are quite a few, and 
almost none of them check return code - i.e., I haven't found one that does), 
then this problem will continue to go undetected unless you hit the "bad" code 
path. I'd rather ensure we discover it during devel instead of in production, 
but as you say, it is pretty rare.

> 
> I offer the following two points:
> 
> - this is a coding error on the OMPI developer
> - it's pretty rare
> 
> 
> 
> On Nov 1, 2011, at 7:30 PM, George Bosilca wrote:
> 
>> 1
>> 
>> george.
>> 
>> On Nov 1, 2011, at 17:23 , Jeff Squyres wrote:
>> 
>>> Can you clarify -- I can parse your text multiple ways.  Which are you 
>>> voting for?
>>> 
>>> 1. show_help + return error code in all cases.
>>> 2. if OPAL_ENABLE_DEBUG, show_help + exit(1), else silently return error 
>>> code.
>>> 3. show_help.  if OPAL_ENABLE_DEBUG, exit(1), else return error code.
>>> 
>>> 
>>> 
>>> On Nov 1, 2011, at 4:50 PM, George Bosilca wrote:
>>> 
>>>> This is a much saner solution. We [mostly] stayed away from calling exit 
>>>> deep into our libraries, there is no reason to add it now. I'll vote in 
>>>> favor of show_help + return code.
>>>> 
>>>> george.
>>>> 
>>>> On Nov 1, 2011, at 15:14 , Jeff Squyres wrote:
>>>> 
>>>>> We talked about this on the call today.
>>>>> 
>>>>> A good suggestion was made: call show_help/opal_finalize/exit only when 
>>>>> OPAL_ENABLE_DEBUG is true.  Otherwise, return an error code.
>>>>> 
>>>>> If no one objects to this, I'll commit this tomorrow.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Oct 31, 2011, at 4:16 PM, Jeff Squyres wrote:
>>>>> 
>>>>>> WHAT: what to do if registering an MCA param results in an error?
>>>>>> 
>>>>>> WHERE: opal/mca/base/mca_base_param.c
>>>>>> 
>>>>>> WHY: MCA param re-registration issues should be treated as OMPI 
>>>>>> developer errors
>>>>>> 
>>>>>> WHEN: COB Friday, 4 Nov 2011
>>>>>> 
>>>>>> -----------------
>>>>>> 
>>>>>> Short version: 
>>>>>> 
>>>>>> Re-registering an MCA param to be a different type (e.g., it was 
>>>>>> initially registered to be a string, but was later re-registered to be 
>>>>>> an int) should be treated as an OMPI developer error, and should 
>>>>>> opal_finalize()/exit(1).
>>>>>> 
>>>>>> More details:
>>>>>> 
>>>>>> A mistaken MCA param re-registration recently caused an orted segv.
>>>>>> 
>>>>>> The MCA param subsystem was fixed to avoid this segv, but silently 
>>>>>> convert the MCA param to the newly-registered type.  Upon reflection and 
>>>>>> some discussion, this seems to be a bad idea.  Instead, we should loudly 
>>>>>> complain via a show_help message and then exit(1).
>>>>>> 
>>>>>> Specifically: this kind of behavior is clearly an error and should be 
>>>>>> fixed.  Unfortunately, in most cases, we don't actually check the return 
>>>>>> value from MCA param registration functions, so if we change the MCA 
>>>>>> param function to simply return a non OPAL_SUCCESS status, it's unlikely 
>>>>>> that anyone will notice until some code tries to read the param value, 
>>>>>> likely still resulting in a segv.
>>>>>> 
>>>>>> Does anyone have heartburn if I change the error behavior to 
>>>>>> opal_finalize()/exit(1)?
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jeff Squyres
>>>>> jsquy...@cisco.com
>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to