I really don't like our show_help at every level behavior (look at what happens when MPI_INIT fails, you get a page per process of the same error message from each level of the call stack). If you want to show_help and abort on debug, that makes sense. It doesn't make any sense on a production build. Return an error code and let the upper layer deal with it.
Brian On 11/2/11 11:27 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >Brian: you were the one that had an allergic reaction to #1 on the call. > >Thoughts? > > >On Nov 2, 2011, at 1:23 PM, George Bosilca wrote: > >> As it has been said, this is not something supposed to make it in a >>release. On the unfortunate case where it does, always having a >>show_help will ensure a quick complaint on one of our mailing lists and >>increase the probability of a [very] quick fix. >> >> george. >> >> On Nov 2, 2011, at 06:26 , TERRY DONTJE wrote: >> >>> >>> >>> On 11/1/2011 7:48 PM, Jeff Squyres wrote: >>>> So this was slightly different than the opinion that was discussed on >>>>the call today, which was 2. The rationale for #2 was to punish >>>>developers, but if such a bug did make it through to production, users >>>>wouldn't be annoyed with show_help messages all the time. >>>> >>>> Does anyone have strong opinions here? I don't. >>>> >>>> I offer the following two points: >>>> >>>> - this is a coding error on the OMPI developer >>>> - it's pretty rare >>>> >>>> >>> I think a show_help + return is very helpful in this case. I wouldn't >>>think that we'd run into this case that much and it would seem that it >>>would be a rare occurance that one could just fix when they run into >>>it. However, since there was some opposition to having show_help >>>messages possibly coming up all over the place I thought a fall >>>back of only doing the show_help on enable_debug builds was a >>>reasonable middle ground. >>> >>> --td >>>> On Nov 1, 2011, at 7:30 PM, George Bosilca wrote: >>>> >>>> >>>>> 1 >>>>> >>>>> george. >>>>> >>>>> On Nov 1, 2011, at 17:23 , Jeff Squyres wrote: >>>>> >>>>> >>>>>> Can you clarify -- I can parse your text multiple ways. Which are >>>>>>you voting for? >>>>>> >>>>>> 1. show_help + return error code in all cases. >>>>>> 2. if OPAL_ENABLE_DEBUG, show_help + exit(1), else silently return >>>>>>error code. >>>>>> 3. show_help. if OPAL_ENABLE_DEBUG, exit(1), else return error >>>>>>code. >>>>>> >>>>>> >>>>>> >>>>>> On Nov 1, 2011, at 4:50 PM, George Bosilca wrote: >>>>>> >>>>>> >>>>>>> This is a much saner solution. We [mostly] stayed away from >>>>>>>calling exit deep into our libraries, there is no reason to add it >>>>>>>now. I'll vote in favor of show_help + return code. >>>>>>> >>>>>>> george. >>>>>>> >>>>>>> On Nov 1, 2011, at 15:14 , Jeff Squyres wrote: >>>>>>> >>>>>>> >>>>>>>> We talked about this on the call today. >>>>>>>> >>>>>>>> A good suggestion was made: call show_help/opal_finalize/exit >>>>>>>>only when OPAL_ENABLE_DEBUG is true. Otherwise, return an error >>>>>>>>code. >>>>>>>> >>>>>>>> If no one objects to this, I'll commit this tomorrow. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Oct 31, 2011, at 4:16 PM, Jeff Squyres wrote: >>>>>>>> >>>>>>>> >>>>>>>>> WHAT: what to do if registering an MCA param results in an error? >>>>>>>>> >>>>>>>>> WHERE: opal/mca/base/mca_base_param.c >>>>>>>>> >>>>>>>>> WHY: MCA param re-registration issues should be treated as OMPI >>>>>>>>>developer errors >>>>>>>>> >>>>>>>>> WHEN: COB Friday, 4 Nov 2011 >>>>>>>>> >>>>>>>>> ----------------- >>>>>>>>> >>>>>>>>> Short version: >>>>>>>>> >>>>>>>>> Re-registering an MCA param to be a different type (e.g., it was >>>>>>>>>initially registered to be a string, but was later re-registered >>>>>>>>>to be an int) should be treated as an OMPI developer error, and >>>>>>>>>should opal_finalize()/exit(1). >>>>>>>>> >>>>>>>>> More details: >>>>>>>>> >>>>>>>>> A mistaken MCA param re-registration recently caused an orted >>>>>>>>>segv. >>>>>>>>> >>>>>>>>> The MCA param subsystem was fixed to avoid this segv, but >>>>>>>>>silently convert the MCA param to the newly-registered type. >>>>>>>>>Upon reflection and some discussion, this seems to be a bad idea. >>>>>>>>> Instead, we should loudly complain via a show_help message and >>>>>>>>>then exit(1). >>>>>>>>> >>>>>>>>> Specifically: this kind of behavior is clearly an error and >>>>>>>>>should be fixed. Unfortunately, in most cases, we don't actually >>>>>>>>>check the return value from MCA param registration functions, so >>>>>>>>>if we change the MCA param function to simply return a non >>>>>>>>>OPAL_SUCCESS status, it's unlikely that anyone will notice until >>>>>>>>>some code tries to read the param value, likely still resulting >>>>>>>>>in a segv. >>>>>>>>> >>>>>>>>> Does anyone have heartburn if I change the error behavior to >>>>>>>>>opal_finalize()/exit(1)? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jeff Squyres >>>>>>>>> >>>>>>>>> jsquy...@cisco.com >>>>>>>>> >>>>>>>>> For corporate legal information go to: >>>>>>>>> >>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> -- >>>>>>>> Jeff Squyres >>>>>>>> >>>>>>>> jsquy...@cisco.com >>>>>>>> >>>>>>>> For corporate legal information go to: >>>>>>>> >>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> -- >>>>>> Jeff Squyres >>>>>> >>>>>> jsquy...@cisco.com >>>>>> >>>>>> For corporate legal information go to: >>>>>> >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> _______________________________________________ >>>>> devel mailing list >>>>> >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> -- >>> <Mail Attachment.gif> >>> Terry D. Dontje | Principal Software Engineer >>> Developer Tools Engineering | +1.781.442.2631 >>> Oracle - Performance Technologies >>> 95 Network Drive, Burlington, MA 01803 >>> Email terry.don...@oracle.com >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >-- >Jeff Squyres >jsquy...@cisco.com >For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ > > > -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories