Ick. 

I wondered aloud on IM to Terry after your earlier emails if we should just 
custom-patch ltdl in OMPI to fix this issue.  The problem is that libltdl is 
effectively reporting the "wrong" error back to OMPI, so the error string that 
we get to print out ends up not being very useful (e.g., not showing which 
symbol was missing, or what the problem was with the dlopen).  Fixing this 
properly in libltdl is actually somewhat tricky -- which is why it hasn't been 
fixed yet.  But given that OMPI's use of libltdl is pretty specific, we might 
be able to get away with a simple fix that works just for OMPI (but wouldn't 
necessarily be suitable for all other libltdl users).

Hmmm...

This looks do-able.  I'll commit in a bit.



On Mar 5, 2010, at 1:27 PM, Leonardo Fialho wrote:

> I see... but it is really strange because this module is clean, it does not 
> use nothing. This is the output of the nm command, I can't see any symbol 
> which is not available.
> 
> [lfialho@aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so
> 0000000000201208 a _DYNAMIC
> 0000000000201408 a _GLOBAL_OFFSET_TABLE_
>                  w _Jv_RegisterClasses
> 00000000002011e0 d __CTOR_END__
> 00000000002011d8 d __CTOR_LIST__
> 00000000002011f0 d __DTOR_END__
> 00000000002011e8 d __DTOR_LIST__
> 00000000000011d0 r __FRAME_END__
> 00000000002011f8 d __JCR_END__
> 00000000002011f8 d __JCR_LIST__
> 0000000000201640 A __bss_start
>                  w __cxa_finalize@@GLIBC_2.2.5
> 0000000000000d40 t __do_global_ctors_aux
> 00000000000007c0 t __do_global_dtors_aux
> 0000000000201200 d __dso_handle
>                  w __gmon_start__
> 0000000000201640 A _edata
> 0000000000201648 A _end
> 0000000000000d78 T _fini
> 0000000000000750 T _init
> 00000000000007a0 t call_gmon_start
> 0000000000201640 b completed.6115
> 0000000000000810 t frame_dummy
>                  U mca_pml_v
> 0000000000201460 D mca_vprotocol_receiver
> 0000000000000c71 t mca_vprotocol_receiver_add_comm
> 0000000000000a5f t mca_vprotocol_receiver_add_procs
> 0000000000201540 D mca_vprotocol_receiver_component
> 0000000000000cc3 t mca_vprotocol_receiver_component_close
> 0000000000000d18 t mca_vprotocol_receiver_component_finalize
> 0000000000000cce t mca_vprotocol_receiver_component_init
> 0000000000000cb8 t mca_vprotocol_receiver_component_open
> 0000000000000c93 t mca_vprotocol_receiver_del_comm
> 0000000000000a89 t mca_vprotocol_receiver_del_procs
> 000000000000083c t mca_vprotocol_receiver_dump
> 0000000000000d23 t mca_vprotocol_receiver_enable
> 00000000000009e7 t mca_vprotocol_receiver_iprobe
> 0000000000000b9a t mca_vprotocol_receiver_irecv
> 0000000000000ab3 t mca_vprotocol_receiver_isend
> 0000000000000a29 t mca_vprotocol_receiver_probe
> 0000000000000c00 t mca_vprotocol_receiver_recv
> 0000000000000b21 t mca_vprotocol_receiver_send
> 00000000000009bd T mca_vprotocol_receiver_start
> 0000000000000864 t mca_vprotocol_receiver_test
> 0000000000000896 t mca_vprotocol_receiver_test_all
> 00000000000008d0 t mca_vprotocol_receiver_test_any
> 0000000000000950 t mca_vprotocol_receiver_test_some
> 0000000000000916 t mca_vprotocol_receiver_wait_any
> 000000000000098a t mca_vprotocol_receiver_wait_some
>                  U ompi_request_null
>                  U opal_output
> 0000000000201440 d p.6113
> [lfialho@aoclsb-clus openmpi]$
> 
> On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote:
> 
> > Sorry meant to add this, but you might be able to try and find the symbol 
> > causing the issue by twiddling with LD_DEBUG
> >
> > --td
> > Terry Dontje wrote:
> >> Possibly there is an external symbol in the .so that is being loaded that 
> >> cannot be resolved.
> >> --td
> >> Leonardo Fialho wrote:
> >>> Hi,
> >>>
> >>> I know that libtool does not help us to find the source of this error, 
> >>> but, what can generate the following error?
> >>>
> >>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open 
> >>> /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing 
> >>> symbol, or compiled for a different version of Open MPI? (ignored)
> >>>
> >>> 1) yes, the file exists
> >>> 2) yes, it has been compiled among all other components
> >>> 3) yes, it is the same Open MPI version
> >>> 4) this component is a copy of the pessimist component implemented by 
> >>> Aurelien
> >>> 5) Aurelien's component presents the same error
> >>>
> >>> The question is: what mistake should generate an error during module 
> >>> loading?
> >>>
> >>> Thanks in advance,
> >>> Leonardo
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> 
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to