Oh, good catch -- thanks.

I wouldn't call abort -- that will dump core.  Just show_help() and 
exit(nonzero), I guess.


On Dec 4, 2014, at 3:31 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> You can't use the PML error reporting mechanism in this particular instance, 
> it is too early in the setup process (in the BTL component init function) and 
> the PML has not setup the error callback yet.
> 
> This function is called during the MPI_Init, at a time where most of the Open 
> MPI infrastructure is not yet setup. I guess the safest way to force the 
> process to fail is to call exit or maybe abort.
> 
> George.
> 
> 
> 
> On Fri, Dec 5, 2014 at 3:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> You're supposed to call the PML error handler, which was passed down to the 
> BTL during initialization.
> 
> That is, the BTL registers a btl_register_error function with the PML.  The 
> PML then calls this function and passes in its error handler function 
> pointer.  The BTL can then use that error handler to tell the PML when an 
> error occurs.
> 
> Right now, the only PML error handler aborts the job.  So this should be a 
> sufficient mechanism.
> 
> 
> On Dec 3, 2014, at 12:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> > We talked during the telecon about the user-reported issue where they asked 
> > for knem support, it wasn’t available on the system, but we ran anyway at a 
> > reduced performance level. The agreement we had was that OMPI should 
> > instead fail at that point since the user had requested something we could 
> > not do. I got tasked with implementing this.
> >
> > Here is the problem code:
> >
> >    /* If "use_knem" is positive, then it's an error if knem support
> >       is not available -- deactivate the sm btl. */
> >    if (mca_btl_sm_component.use_knem > 0) {
> >        opal_show_help("help-mpi-btl-sm.txt",
> >                       "knem requested but not available",
> >                       true, opal_process_info.nodename);
> >        return NULL;
> >
> > As you can see, we deactivate sm but do not necessarily fail. Question for 
> > you folks: how do I cause us to safely fail from within a BTL??
> >
> > Thanks
> > Ralph
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/12/16425.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16435.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16436.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to