Mellanox --

I investigated the ibverbs configury issue reported by Paul Hargrove (initial 
post: http://www.open-mpi.org/community/lists/devel/2014/01/13598.php), and it 
looks like it's an oshmem configury issue.  The short version is that oshmem is 
doing some configure tests a) at the wrong time, and b) in the wrong place.

Both things are happening in OSHMEM_SETUP_CFLAGS, which is being invoked very, 
very late in configure.ac:

a) OSHMEM_SETUP_CFLAGS is after all framework/component setup, and is during 
final *FLAGS (e.g., CFLAGS) processing.  In this case, LDFLAGS has been loaded 
with -export-dynamic, which is intended to be used by libtool.  But then 
OSHMEM_SETUP_CFLAGS invokes tests that use LDFLAGS with plain CC, and badness 
can occur.

b) But I'm confused as to the purpose of OSHMEM_SETUP_CFLAGS, anyway:

b1) It's calling OMPI_C_COMPILER_VENDOR([oshmem_c_vendor]).  But I can't find 
where this is used.  Am I missing it?  If not, it should be removed.

b2) The rest of OSHMEM_SETUP_CFLAGS is all verbs-specific (e.g., it calls 
OMPI_CHECK_OPENFABRICS).  It looks like the flags and #define it sets are in 
the mca/memheap/base.  Two issues:

b2a) Tests that are specific to a framework should be in that framework's 
configure.m4 (e.g., oshmem/mca/memheap/configure.m4).  They should not 
(effectively) be in the top-level configure.ac.

b2b) Why is all this verbs-specific stuff in the memheap base?  It seems like 
an abstraction violation -- the whole point of components is to have 
platform-specific code in components, not in the core/base library.  Put 
simply: as a rule of thumb, you shouldn't need to link libibverbs -- or any 
other network stack library -- in the wrapper compilers (when building libmpi 
as shared library with plugins).  If you do, it means you have network-specific 
code in OMPI's core libraries, and you got the abstractions wrong.

>From how I'm currently understanding this, it seems like OSHMEM_SETUP_CFLAGS 
>should go away, the tests it is doing should move to a component's 
>configure.m4, and the verbs-specific code in memheap/base should also move to 
>a component.

Am I misunderstanding this?  Can you explain this in more detail?

If it would be helpful, we can discuss this on a webex next week, or somesuch.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to