I see several main categories of leaks at the current head (I did not test 
before Ralph's changes):

- a bunch of RTE leaks
  --> which is think is what Ralph is trying to pare down

- OMPI pre-defined attribute leaks
  --> This will take a little thinking to fix; it's complicated

- OMPI pre-defined datatype leaks
  --> George says these can't be fixed, but I don't believe it :-)

- BML cleanup leaks
  --> Not sure what's happening here yet

- PML allocator leaks (ob1, csum, bfo)
  --> it looks like much is allocated (including the allocator) during 
component_open, but it is only freed during component_fini -- NOT during 
component_close.  So unselected PMLs (e.g., csum, bfo) have leaks.  This seems 
like a mistake -- should that which is freed in component_fini be moved to 
component_close?

- some miscellaneous leaks
  --> I just committed fixes for these in r24865



On Jul 8, 2011, at 9:37 AM, George Bosilca wrote:

> What memory leaks? My valgrind claims there are basically none at the MPI 
> level, so I'm wondering ...
> 
> First, let's make sure everyone understand why there are two initialization 
> functions in OPAL. One has to call opal_init_util before anything else 
> otherwise no access to utilities dealing with the command line, 
> configurations file, output system, help and SOS stuff and so on will be 
> available. Initializing OPAL adds all the other frameworks: the MCA base, 
> memory, backtrace, carto and friends.
> 
> Now let's look how this is used from the MPI perspective:
> 
> ompi_mpi_init.c: 309 -> opal_init_util
> 
> ompi_mpi_init.c:357 -> orte_init
> 
>  orte_init.c:81 -> opal_init
> 
>    opal_init.c:335 -> opal_init_util
> 
> ------------------------------------
> 
> ompi_mpi_finalize.c:430  -> orte_finalize
> 
>  orte_finalize.c:42  -> opal_finalize
> 
>    opal_finalize.c:163  -> opal_finalize_util
> 
> ompi_mpi_finalize.c:434  -> opal_finalize_util
> 
> So we have opal_init * 1 and opal_util * 2. Clearly the opal util is not a 
> simple ON/OFF stuff. With Ralph patch the OPAL utilities will disappear as 
> soon as the OMPI layer call orte_fini. Luckily, today there is nothing 
> between the call to orte_fini and opal_finalize_util, so we're safe from a 
> segfault.
> 
> Moreover, from a software engineering point of view there are two choices for 
> allowing library composition (ORTE using OPAL, OMPI using ORTE and OPAL, 
> something else using OMPI and ORTE and OPAL). Either you do the management at 
> the lowest level using counters, or you provide accessors to check the 
> init/fini state of the library and do the management at the upper level 
> (similar to the MPI library). In Open MPI and this for the last 7 years we 
> chose the first approach. And so far there was no compelling case to switch.
> 
>  george.
> 
> On Jul 8, 2011, at 14:42 , Jeff Squyres wrote:
> 
>> Developers --
>> 
>> Ralph and I talked about this one the other day; he's trying to plug some 
>> memory leaks during finalize.  
>> 
>> Before this commit, there was a counter that counts how many times 
>> opal_init*() is invoked.  opal_finalize() decremented the counter, but 
>> didn't actually do anything else until the counter went to zero.  The 
>> original intent was that we should call finalize exactly as many times as we 
>> call initialize, but it appears that we didn't do that (e.g., we call 
>> opal_init_util() and then opal_init(), but then only call opal_finalize() 
>> once).
>> 
>> Does anyone have a reason NOT to convert the init/finalize counter to a 
>> simple boolean?  (i.e., can you provide a reason to back this change out?)
>> 
>> 
>> 
>> On Jul 8, 2011, at 2:43 AM, r...@osl.iu.edu wrote:
>> 
>>> Author: rhc
>>> Date: 2011-07-08 02:43:19 EDT (Fri, 08 Jul 2011)
>>> New Revision: 24862
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/24862
>>> 
>>> Log:
>>> Replace a useless counter with a boolean check to see if we have already 
>>> passed thru opal_finalize so we don't call finalize, and then don't pass 
>>> thru it (as was happening on several tools)
>>> 
>>> Text files modified: 
>>> trunk/opal/runtime/opal_finalize.c |    16 ++++++----------                 
>>>        
>>> trunk/opal/runtime/opal_init.c     |    16 ++++++----------                 
>>>        
>>> trunk/orte/tools/orterun/orterun.c |     4 ----                             
>>>        
>>> 3 files changed, 12 insertions(+), 24 deletions(-)
>>> 
>>> Modified: trunk/opal/runtime/opal_finalize.c
>>> ==============================================================================
>>> --- trunk/opal/runtime/opal_finalize.c      (original)
>>> +++ trunk/opal/runtime/opal_finalize.c      2011-07-08 02:43:19 EDT (Fri, 
>>> 08 Jul 2011)
>>> @@ -54,18 +54,16 @@
>>> #include "opal/runtime/opal_cr.h"
>>> #include "opal/mca/crs/base/base.h"
>>> 
>>> -extern int opal_initialized;
>>> -extern int opal_util_initialized;
>>> +extern bool opal_initialized;
>>> +extern bool opal_util_initialized;
>>> 
>>> int
>>> opal_finalize_util(void)
>>> {
>>> -    if( --opal_util_initialized != 0 ) {
>>> -        if( opal_util_initialized < 0 ) {
>>> -            return OPAL_ERROR;
>>> -        }
>>> +    if (!opal_util_initialized) {
>>>       return OPAL_SUCCESS;
>>>   }
>>> +    opal_util_initialized = false;
>>> 
>>>   /* Clear out all the registered MCA params */
>>>   mca_base_param_finalize();
>>> @@ -111,12 +109,10 @@
>>> int
>>> opal_finalize(void)
>>> {
>>> -    if( --opal_initialized != 0 ) {
>>> -        if( opal_initialized < 0 ) {
>>> -            return OPAL_ERROR;
>>> -        }
>>> +    if (!opal_initialized) {
>>>       return OPAL_SUCCESS;
>>>   }
>>> +    opal_initialized = false;
>>> 
>>>   /* close the checkpoint and restart service */
>>>   opal_cr_finalize();
>>> 
>>> Modified: trunk/opal/runtime/opal_init.c
>>> ==============================================================================
>>> --- trunk/opal/runtime/opal_init.c  (original)
>>> +++ trunk/opal/runtime/opal_init.c  2011-07-08 02:43:19 EDT (Fri, 08 Jul 
>>> 2011)
>>> @@ -68,8 +68,8 @@
>>> #endif
>>> const char opal_version_string[] = OPAL_IDENT_STRING;
>>> 
>>> -int opal_initialized = 0;
>>> -int opal_util_initialized = 0;
>>> +bool opal_initialized = false;
>>> +bool opal_util_initialized = false;
>>> int opal_cache_line_size;
>>> 
>>> static int
>>> @@ -219,12 +219,10 @@
>>>   int ret;
>>>   char *error = NULL;
>>> 
>>> -    if( ++opal_util_initialized != 1 ) {
>>> -        if( opal_util_initialized < 1 ) {
>>> -            return OPAL_ERROR;
>>> -        }
>>> +    if (opal_util_initialized) {
>>>       return OPAL_SUCCESS;
>>>   }
>>> +    opal_util_initialized = true;
>>> 
>>>   /* JMS See note in runtime/opal.h -- this is temporary; to be
>>>      replaced with real hwloc information soon (in trunk/v1.5 and
>>> @@ -324,12 +322,10 @@
>>>   int ret;
>>>   char *error = NULL;
>>> 
>>> -    if( ++opal_initialized != 1 ) {
>>> -        if( opal_initialized < 1 ) {
>>> -            return OPAL_ERROR;
>>> -        }
>>> +    if (opal_initialized) {
>>>       return OPAL_SUCCESS;
>>>   }
>>> +    opal_initialized = true;
>>> 
>>>   /* initialize util code */
>>>   if (OPAL_SUCCESS != (ret = opal_init_util(pargc, pargv))) {
>>> 
>>> Modified: trunk/orte/tools/orterun/orterun.c
>>> ==============================================================================
>>> --- trunk/orte/tools/orterun/orterun.c      (original)
>>> +++ trunk/orte/tools/orterun/orterun.c      2011-07-08 02:43:19 EDT (Fri, 
>>> 08 Jul 2011)
>>> @@ -618,10 +618,6 @@
>>>       ORTE_ERROR_LOG(rc);
>>>       return rc;
>>>   }
>>> -    /* finalize the OPAL utils. As they are opened again from 
>>> orte_init->opal_init
>>> -     * we continue to have a reference count on them. So we have to 
>>> finalize them twice...
>>> -     */
>>> -    opal_finalize_util();
>>> 
>>>   /* check for request to report uri */
>>>   if (NULL != orterun_globals.report_uri) {
>>> _______________________________________________
>>> svn-full mailing list
>>> svn-f...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to