Actually I think if you build your job with one kernel version and run it on nodes that have another version so rdmacm will be the smallest your problem.
Anyway, here is the revision fixes the issue.

------------------------------------------------------------------------
r31194 | vasily | 2014-03-24 15:36:04 +0200 (Mon, 24 Mar 2014) | 3 lines

BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by lib rdmacm during component_init.


------------------------------------------------------------------------

Thank you,
Vasily.

On 13-Mar-14 15:44, Ralph Castain wrote:
I think the critical point is this one:

To be clear: whether AF_IB works or not is a determination to make on the 
machines on which you *run* -- NOT on the machine on which you *build*.
Many of our users compile on the frontend node of their cluster, which doesn't 
even have an IB NIC installed (they only have the libraries present so it can 
compile). You need to test this at run time to ensure you are on a machine 
where someone actually is able to run rdmacm.


On Mar 13, 2014, at 5:53 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

On Mar 13, 2014, at 4:59 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

Right?  If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is easily 
detectable as to which way it is compiled (because it has, for example, 
different fields), then AC_CHECK_DECLS should be enough, right?
RDMACM API has different implementation requirements for its providers: tcp, 
af_ib (different structs/fields should be used/passed. different APIs/hooks 
should be called for bring-up).
Yes, this was said before.  Which is why I don't understand why AC_CHECK_DECLS 
isn't enough -- it's a compile-time check, right?

Let me get this straight:

1. AF_IB may or may not be present.
2. If AF_IB is present, it may or may not work (i.e., support for AF_IB may or 
may not work in the kernel).
3. If AF_IB is present, you can only compile with the AF_IB fields and methods.
4. If AF_IB is not present, you can only compile with the non-AF_IB fields and 
methods.

I think #2 is not relevant for configure -- only #1, #3, and #4 are relevant.  
So you should have code something like this:

#if HAVE_DECL_AF_IB
    ret = do_the_stuff_with_af_ib(...);
    if (OMPI_SUCCESS != ret) {
        opal_show_help(...AF_IB doesn't work...);
        return ret;
    }
#else
    ret = do_the_stuff_without_af_ib(...);
    if (OMPI_SUCCESS != ret) {
        opal_show_help(...non-AF_IB doesn't work...);
        return ret;
    }
#endif

To be clear: whether AF_IB works or not is a determination to make on the 
machines on which you *run* -- NOT on the machine on which you *build*.

This is one of the key reasons that OMPI prefers run-time detection for 
run-time characteristics over configure-time detection for run-time 
characteristics (because you may run OMPI on different machines than where you 
built OMPI).

Currently, the RDMACM provider can be selected at compile time only and mpirun 
becomes incompatible to other RDMACM providers.
What does mpirun have to do with this?  We're talking about the openib BTL, 
right?

AC_TRY_RUN is a protection that selected provider will be able to run,otherwise 
no fallback to other provider will be available for user at runtime.
I can't parse this statement...?

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/03/14342.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/03/14343.php


Reply via email to