Hi Folks,

Is the opal library explicitly closed by a dlclose?  

I don't think there's anything wrong with using ctor/dtors in shared libraries,
but one does need to make sure that in these functions there's no assumptions
about ordering of them wrt to other ctors/dtors.    shared libraries explicitly
loaded/unloaded by the executable should have less of an issue with respect
to these ordering issues. 

Also, for static linking, care needs to be taken.  It may be necessary to use
whole-archive etc. on the ld line to get the ctor/dtors actually loaded in the
executable.  

Howard


-----Original Message-----
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Tuesday, July 15, 2014 12:45 PM
To: Open MPI Developers; Hjelm, Nathan Thomas
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to 
opal

I withdraw my comment on this, it turns out I “misspoke” (or in other words I 
was wrong about the class cleanup). The base class structures are stored as 
objects in the corresponding shared library memory region, and these regions 
become unavailable once a shared library is unloaded. As a result we are 
utterly unable to cleanup the classes at the OPAL layer after the other shared 
libraries have been unloaded.

Moreover, Nathan was right in his proposal, the only possible cleanup approach 
is to use the destructor attribute of the OPAL library to cleanup the mess once 
all libraries are unloaded.

  George.



On July 15, 2014 at 1:17:26 AM, George Bosilca (bosi...@icl.utk.edu) wrote:
> Nathan,
>  
> Fixing the classes to correctly tear down everything was a two lines 
> patch. However, this doesn’t fix the bigger issue, which is related to 
> the fact that not all frameworks are correctly teared down, and when 
> they are they leave behind char* parameters not set to NULL, and that 
> the framework infrastructure is not keen of being reinitialized due to too 
> many globals not correctly handled.
>  
> If I correctly understand the meaning of the proposed destructor 
> approach, it is only called when the library is being unloaded or when 
> the application exit. Thus, adding the destructor is a bandaid, 
> addressing a marginal annoyance (partially keeping valgrind
> happy) without addressing the real issue (being able to call MPI_Init after 
> MPI_T_finalize).  
>  
> George.
>  
>  
>  
> On July 14, 2014 at 6:07:08 PM, Nathan Hjelm (hje...@lanl.gov) wrote:
> >
> > What: Add a library destructor function to OPAL. The new function 
> > would take care of cleaning up some of OPAL's state (closing 
> > frameworks, shutting down MCA, etc).
> >
> > Why: OPAL can not currently be re-initialized. There are numerous 
> > problems throughout the project that will make it difficult (but not
> > impossible) to get opal in a state where we can allow 
> > re-initialization. Additionally, there are probably arguments 
> > against making opal re-initable.
> >
> > opal not being re-initializable would not normally be a problem 
> > except that the following code sequence always crashes:
> >
> > MPI_T_Init_thread (); <-- Calls opal_init_util() MPI_T_Finalize (); 
> > <-- Calls opal_finalize_util()
> >
> > MPI_Init (); <-- SEGV
> >
> > This happens because MPI_T_Finalize() calls opal_finalize_util() to 
> > ensure maximum valgrind cleanness. This call causes OPAL to tear 
> > down OPAL classes (among other things) leading to the SEGV on the 
> > next call to opal_init()/opal_init_util(). There is an open ticket on this 
> > issue:
> >
> > https://svn.open-mpi.org/trac/ompi/ticket/4490
> >
> > To fix this problem I want to add a destructor function to OPAL. 
> > This function would take on some of the current functionality of 
> > opal_finalize_util(). This would solve the above issue without 
> > having to update OPAL to allow re-initialization.
> >
> > For those not familiar with destructor functions. They are always 
> > called at the end of execution or when the library is closed 
> > (dl_close). Multiple destructors functions can be defined. Marking a 
> > function as a destructor is simple:
> >
> > void __attribute__((destructor)) foo (void);
> >
> >
> > When: Setting a timeout for next Friday (July 25).
> >
> >
> > -Nathan
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15140.php
>  
>  

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15150.php

Reply via email to