You're supposed to call the PML error handler, which was passed down to the BTL 
during initialization.

That is, the BTL registers a btl_register_error function with the PML.  The PML 
then calls this function and passes in its error handler function pointer.  The 
BTL can then use that error handler to tell the PML when an error occurs.

Right now, the only PML error handler aborts the job.  So this should be a 
sufficient mechanism.


On Dec 3, 2014, at 12:15 PM, Ralph Castain <r...@open-mpi.org> wrote:

> We talked during the telecon about the user-reported issue where they asked 
> for knem support, it wasn’t available on the system, but we ran anyway at a 
> reduced performance level. The agreement we had was that OMPI should instead 
> fail at that point since the user had requested something we could not do. I 
> got tasked with implementing this.
> 
> Here is the problem code:
> 
>    /* If "use_knem" is positive, then it's an error if knem support
>       is not available -- deactivate the sm btl. */
>    if (mca_btl_sm_component.use_knem > 0) {
>        opal_show_help("help-mpi-btl-sm.txt",
>                       "knem requested but not available",
>                       true, opal_process_info.nodename);
>        return NULL;
> 
> As you can see, we deactivate sm but do not necessarily fail. Question for 
> you folks: how do I cause us to safely fail from within a BTL??
> 
> Thanks
> Ralph
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16425.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to