Ick. I wondered aloud on IM to Terry after your earlier emails if we should just custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is effectively reporting the "wrong" error back to OMPI, so the error string that we get to print out ends up not being very useful (e.g., not showing which symbol was missing, or what the problem was with the dlopen). Fixing this properly in libltdl is actually somewhat tricky -- which is why it hasn't been fixed yet. But given that OMPI's use of libltdl is pretty specific, we might be able to get away with a simple fix that works just for OMPI (but wouldn't necessarily be suitable for all other libltdl users).
Hmmm... This looks do-able. I'll commit in a bit. On Mar 5, 2010, at 1:27 PM, Leonardo Fialho wrote: > I see... but it is really strange because this module is clean, it does not > use nothing. This is the output of the nm command, I can't see any symbol > which is not available. > > [lfialho@aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so > 0000000000201208 a _DYNAMIC > 0000000000201408 a _GLOBAL_OFFSET_TABLE_ > w _Jv_RegisterClasses > 00000000002011e0 d __CTOR_END__ > 00000000002011d8 d __CTOR_LIST__ > 00000000002011f0 d __DTOR_END__ > 00000000002011e8 d __DTOR_LIST__ > 00000000000011d0 r __FRAME_END__ > 00000000002011f8 d __JCR_END__ > 00000000002011f8 d __JCR_LIST__ > 0000000000201640 A __bss_start > w __cxa_finalize@@GLIBC_2.2.5 > 0000000000000d40 t __do_global_ctors_aux > 00000000000007c0 t __do_global_dtors_aux > 0000000000201200 d __dso_handle > w __gmon_start__ > 0000000000201640 A _edata > 0000000000201648 A _end > 0000000000000d78 T _fini > 0000000000000750 T _init > 00000000000007a0 t call_gmon_start > 0000000000201640 b completed.6115 > 0000000000000810 t frame_dummy > U mca_pml_v > 0000000000201460 D mca_vprotocol_receiver > 0000000000000c71 t mca_vprotocol_receiver_add_comm > 0000000000000a5f t mca_vprotocol_receiver_add_procs > 0000000000201540 D mca_vprotocol_receiver_component > 0000000000000cc3 t mca_vprotocol_receiver_component_close > 0000000000000d18 t mca_vprotocol_receiver_component_finalize > 0000000000000cce t mca_vprotocol_receiver_component_init > 0000000000000cb8 t mca_vprotocol_receiver_component_open > 0000000000000c93 t mca_vprotocol_receiver_del_comm > 0000000000000a89 t mca_vprotocol_receiver_del_procs > 000000000000083c t mca_vprotocol_receiver_dump > 0000000000000d23 t mca_vprotocol_receiver_enable > 00000000000009e7 t mca_vprotocol_receiver_iprobe > 0000000000000b9a t mca_vprotocol_receiver_irecv > 0000000000000ab3 t mca_vprotocol_receiver_isend > 0000000000000a29 t mca_vprotocol_receiver_probe > 0000000000000c00 t mca_vprotocol_receiver_recv > 0000000000000b21 t mca_vprotocol_receiver_send > 00000000000009bd T mca_vprotocol_receiver_start > 0000000000000864 t mca_vprotocol_receiver_test > 0000000000000896 t mca_vprotocol_receiver_test_all > 00000000000008d0 t mca_vprotocol_receiver_test_any > 0000000000000950 t mca_vprotocol_receiver_test_some > 0000000000000916 t mca_vprotocol_receiver_wait_any > 000000000000098a t mca_vprotocol_receiver_wait_some > U ompi_request_null > U opal_output > 0000000000201440 d p.6113 > [lfialho@aoclsb-clus openmpi]$ > > On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote: > > > Sorry meant to add this, but you might be able to try and find the symbol > > causing the issue by twiddling with LD_DEBUG > > > > --td > > Terry Dontje wrote: > >> Possibly there is an external symbol in the .so that is being loaded that > >> cannot be resolved. > >> --td > >> Leonardo Fialho wrote: > >>> Hi, > >>> > >>> I know that libtool does not help us to find the source of this error, > >>> but, what can generate the following error? > >>> > >>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open > >>> /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing > >>> symbol, or compiled for a different version of Open MPI? (ignored) > >>> > >>> 1) yes, the file exists > >>> 2) yes, it has been compiled among all other components > >>> 3) yes, it is the same Open MPI version > >>> 4) this component is a copy of the pessimist component implemented by > >>> Aurelien > >>> 5) Aurelien's component presents the same error > >>> > >>> The question is: what mistake should generate an error during module > >>> loading? > >>> > >>> Thanks in advance, > >>> Leonardo > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/