Terry, the mca_pml_v is declared in a .so, and at loading time it should export 
the symbol. But, this component load another modules like 
mca_vprotocol_pessimist or mca_vprotocol_receiver (in my case). The symbol is 
declared on the pml_v.c which acts as a pseudo-framework loading other 
components, vprotocol_pessimist for example.

As George said the problem is that as mca_pml_v is dynamically loaded and then 
it loads mca_vprotocol_receiver which uses the problematic symbol. The symbol 
should be available in global symbols. I don't know why, but it is not 
occurring.

Jeff, it is really good to have a better output for those kind of errors, but 
it does not change the problem. I think that the vprotocol is the only 
component which load other components in this way. But, all components are 
loaded by libopal in the same way, no?

Leonardo


On Mar 6, 2010, at 12:27 AM, Jeff Squyres wrote:

> We already use global symbols; mca_base_component_repository.c invokes:
> 
>    if (lt_dladvise_global(&opal_mca_dladvise)) {
>        return OPAL_ERROR;
>    }
> 
> 
> On Mar 5, 2010, at 6:18 PM, George Bosilca wrote:
> 
>> Unfortunately this will not fix his issues ;( I pretty sure that his problem 
>> is related to the fact that mca_pml_v is exported by another dynamic module, 
>> and therefore not available via dlsym. I don't think there is a simple 
>> solution for this problem, except going back to GLOBAL symbols.
>> 
>>  george.
>> 
>> On Mar 5, 2010, at 18:02 , Jeff Squyres wrote:
>> 
>>> Ick.
>>> 
>>> I wondered aloud on IM to Terry after your earlier emails if we should just 
>>> custom-patch ltdl in OMPI to fix this issue.  The problem is that libltdl 
>>> is effectively reporting the "wrong" error back to OMPI, so the error 
>>> string that we get to print out ends up not being very useful (e.g., not 
>>> showing which symbol was missing, or what the problem was with the dlopen). 
>>>  Fixing this properly in libltdl is actually somewhat tricky -- which is 
>>> why it hasn't been fixed yet.  But given that OMPI's use of libltdl is 
>>> pretty specific, we might be able to get away with a simple fix that works 
>>> just for OMPI (but wouldn't necessarily be suitable for all other libltdl 
>>> users).
>>> 
>>> Hmmm...
>>> 
>>> This looks do-able.  I'll commit in a bit.
>>> 
>>> 
>>> 
>>> On Mar 5, 2010, at 1:27 PM, Leonardo Fialho wrote:
>>> 
>>>> I see... but it is really strange because this module is clean, it does 
>>>> not use nothing. This is the output of the nm command, I can't see any 
>>>> symbol which is not available.
>>>> 
>>>> [lfialho@aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so
>>>> 0000000000201208 a _DYNAMIC
>>>> 0000000000201408 a _GLOBAL_OFFSET_TABLE_
>>>>                w _Jv_RegisterClasses
>>>> 00000000002011e0 d __CTOR_END__
>>>> 00000000002011d8 d __CTOR_LIST__
>>>> 00000000002011f0 d __DTOR_END__
>>>> 00000000002011e8 d __DTOR_LIST__
>>>> 00000000000011d0 r __FRAME_END__
>>>> 00000000002011f8 d __JCR_END__
>>>> 00000000002011f8 d __JCR_LIST__
>>>> 0000000000201640 A __bss_start
>>>>                w __cxa_finalize@@GLIBC_2.2.5
>>>> 0000000000000d40 t __do_global_ctors_aux
>>>> 00000000000007c0 t __do_global_dtors_aux
>>>> 0000000000201200 d __dso_handle
>>>>                w __gmon_start__
>>>> 0000000000201640 A _edata
>>>> 0000000000201648 A _end
>>>> 0000000000000d78 T _fini
>>>> 0000000000000750 T _init
>>>> 00000000000007a0 t call_gmon_start
>>>> 0000000000201640 b completed.6115
>>>> 0000000000000810 t frame_dummy
>>>>                U mca_pml_v
>>>> 0000000000201460 D mca_vprotocol_receiver
>>>> 0000000000000c71 t mca_vprotocol_receiver_add_comm
>>>> 0000000000000a5f t mca_vprotocol_receiver_add_procs
>>>> 0000000000201540 D mca_vprotocol_receiver_component
>>>> 0000000000000cc3 t mca_vprotocol_receiver_component_close
>>>> 0000000000000d18 t mca_vprotocol_receiver_component_finalize
>>>> 0000000000000cce t mca_vprotocol_receiver_component_init
>>>> 0000000000000cb8 t mca_vprotocol_receiver_component_open
>>>> 0000000000000c93 t mca_vprotocol_receiver_del_comm
>>>> 0000000000000a89 t mca_vprotocol_receiver_del_procs
>>>> 000000000000083c t mca_vprotocol_receiver_dump
>>>> 0000000000000d23 t mca_vprotocol_receiver_enable
>>>> 00000000000009e7 t mca_vprotocol_receiver_iprobe
>>>> 0000000000000b9a t mca_vprotocol_receiver_irecv
>>>> 0000000000000ab3 t mca_vprotocol_receiver_isend
>>>> 0000000000000a29 t mca_vprotocol_receiver_probe
>>>> 0000000000000c00 t mca_vprotocol_receiver_recv
>>>> 0000000000000b21 t mca_vprotocol_receiver_send
>>>> 00000000000009bd T mca_vprotocol_receiver_start
>>>> 0000000000000864 t mca_vprotocol_receiver_test
>>>> 0000000000000896 t mca_vprotocol_receiver_test_all
>>>> 00000000000008d0 t mca_vprotocol_receiver_test_any
>>>> 0000000000000950 t mca_vprotocol_receiver_test_some
>>>> 0000000000000916 t mca_vprotocol_receiver_wait_any
>>>> 000000000000098a t mca_vprotocol_receiver_wait_some
>>>>                U ompi_request_null
>>>>                U opal_output
>>>> 0000000000201440 d p.6113
>>>> [lfialho@aoclsb-clus openmpi]$
>>>> 
>>>> On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote:
>>>> 
>>>>> Sorry meant to add this, but you might be able to try and find the symbol 
>>>>> causing the issue by twiddling with LD_DEBUG
>>>>> 
>>>>> --td
>>>>> Terry Dontje wrote:
>>>>>> Possibly there is an external symbol in the .so that is being loaded 
>>>>>> that cannot be resolved.
>>>>>> --td
>>>>>> Leonardo Fialho wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I know that libtool does not help us to find the source of this error, 
>>>>>>> but, what can generate the following error?
>>>>>>> 
>>>>>>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open 
>>>>>>> /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing 
>>>>>>> symbol, or compiled for a different version of Open MPI? (ignored)
>>>>>>> 
>>>>>>> 1) yes, the file exists
>>>>>>> 2) yes, it has been compiled among all other components
>>>>>>> 3) yes, it is the same Open MPI version
>>>>>>> 4) this component is a copy of the pessimist component implemented by 
>>>>>>> Aurelien
>>>>>>> 5) Aurelien's component presents the same error
>>>>>>> 
>>>>>>> The question is: what mistake should generate an error during module 
>>>>>>> loading?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> Leonardo
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to