What memory leaks? My valgrind claims there are basically none at the MPI 
level, so I'm wondering ...

First, let's make sure everyone understand why there are two initialization 
functions in OPAL. One has to call opal_init_util before anything else 
otherwise no access to utilities dealing with the command line, configurations 
file, output system, help and SOS stuff and so on will be available. 
Initializing OPAL adds all the other frameworks: the MCA base, memory, 
backtrace, carto and friends.

Now let's look how this is used from the MPI perspective:

ompi_mpi_init.c: 309 -> opal_init_util

ompi_mpi_init.c:357 -> orte_init

  orte_init.c:81 -> opal_init

    opal_init.c:335 -> opal_init_util

------------------------------------

ompi_mpi_finalize.c:430  -> orte_finalize

  orte_finalize.c:42  -> opal_finalize

    opal_finalize.c:163  -> opal_finalize_util

ompi_mpi_finalize.c:434  -> opal_finalize_util

So we have opal_init * 1 and opal_util * 2. Clearly the opal util is not a 
simple ON/OFF stuff. With Ralph patch the OPAL utilities will disappear as soon 
as the OMPI layer call orte_fini. Luckily, today there is nothing between the 
call to orte_fini and opal_finalize_util, so we're safe from a segfault.

Moreover, from a software engineering point of view there are two choices for 
allowing library composition (ORTE using OPAL, OMPI using ORTE and OPAL, 
something else using OMPI and ORTE and OPAL). Either you do the management at 
the lowest level using counters, or you provide accessors to check the 
init/fini state of the library and do the management at the upper level 
(similar to the MPI library). In Open MPI and this for the last 7 years we 
chose the first approach. And so far there was no compelling case to switch.

  george.

On Jul 8, 2011, at 14:42 , Jeff Squyres wrote:

> Developers --
> 
> Ralph and I talked about this one the other day; he's trying to plug some 
> memory leaks during finalize.  
> 
> Before this commit, there was a counter that counts how many times 
> opal_init*() is invoked.  opal_finalize() decremented the counter, but didn't 
> actually do anything else until the counter went to zero.  The original 
> intent was that we should call finalize exactly as many times as we call 
> initialize, but it appears that we didn't do that (e.g., we call 
> opal_init_util() and then opal_init(), but then only call opal_finalize() 
> once).
> 
> Does anyone have a reason NOT to convert the init/finalize counter to a 
> simple boolean?  (i.e., can you provide a reason to back this change out?)
> 
> 
> 
> On Jul 8, 2011, at 2:43 AM, r...@osl.iu.edu wrote:
> 
>> Author: rhc
>> Date: 2011-07-08 02:43:19 EDT (Fri, 08 Jul 2011)
>> New Revision: 24862
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/24862
>> 
>> Log:
>> Replace a useless counter with a boolean check to see if we have already 
>> passed thru opal_finalize so we don't call finalize, and then don't pass 
>> thru it (as was happening on several tools)
>> 
>> Text files modified: 
>>  trunk/opal/runtime/opal_finalize.c |    16 ++++++----------                 
>>        
>>  trunk/opal/runtime/opal_init.c     |    16 ++++++----------                 
>>        
>>  trunk/orte/tools/orterun/orterun.c |     4 ----                             
>>        
>>  3 files changed, 12 insertions(+), 24 deletions(-)
>> 
>> Modified: trunk/opal/runtime/opal_finalize.c
>> ==============================================================================
>> --- trunk/opal/runtime/opal_finalize.c       (original)
>> +++ trunk/opal/runtime/opal_finalize.c       2011-07-08 02:43:19 EDT (Fri, 
>> 08 Jul 2011)
>> @@ -54,18 +54,16 @@
>> #include "opal/runtime/opal_cr.h"
>> #include "opal/mca/crs/base/base.h"
>> 
>> -extern int opal_initialized;
>> -extern int opal_util_initialized;
>> +extern bool opal_initialized;
>> +extern bool opal_util_initialized;
>> 
>> int
>> opal_finalize_util(void)
>> {
>> -    if( --opal_util_initialized != 0 ) {
>> -        if( opal_util_initialized < 0 ) {
>> -            return OPAL_ERROR;
>> -        }
>> +    if (!opal_util_initialized) {
>>        return OPAL_SUCCESS;
>>    }
>> +    opal_util_initialized = false;
>> 
>>    /* Clear out all the registered MCA params */
>>    mca_base_param_finalize();
>> @@ -111,12 +109,10 @@
>> int
>> opal_finalize(void)
>> {
>> -    if( --opal_initialized != 0 ) {
>> -        if( opal_initialized < 0 ) {
>> -            return OPAL_ERROR;
>> -        }
>> +    if (!opal_initialized) {
>>        return OPAL_SUCCESS;
>>    }
>> +    opal_initialized = false;
>> 
>>    /* close the checkpoint and restart service */
>>    opal_cr_finalize();
>> 
>> Modified: trunk/opal/runtime/opal_init.c
>> ==============================================================================
>> --- trunk/opal/runtime/opal_init.c   (original)
>> +++ trunk/opal/runtime/opal_init.c   2011-07-08 02:43:19 EDT (Fri, 08 Jul 
>> 2011)
>> @@ -68,8 +68,8 @@
>> #endif
>> const char opal_version_string[] = OPAL_IDENT_STRING;
>> 
>> -int opal_initialized = 0;
>> -int opal_util_initialized = 0;
>> +bool opal_initialized = false;
>> +bool opal_util_initialized = false;
>> int opal_cache_line_size;
>> 
>> static int
>> @@ -219,12 +219,10 @@
>>    int ret;
>>    char *error = NULL;
>> 
>> -    if( ++opal_util_initialized != 1 ) {
>> -        if( opal_util_initialized < 1 ) {
>> -            return OPAL_ERROR;
>> -        }
>> +    if (opal_util_initialized) {
>>        return OPAL_SUCCESS;
>>    }
>> +    opal_util_initialized = true;
>> 
>>    /* JMS See note in runtime/opal.h -- this is temporary; to be
>>       replaced with real hwloc information soon (in trunk/v1.5 and
>> @@ -324,12 +322,10 @@
>>    int ret;
>>    char *error = NULL;
>> 
>> -    if( ++opal_initialized != 1 ) {
>> -        if( opal_initialized < 1 ) {
>> -            return OPAL_ERROR;
>> -        }
>> +    if (opal_initialized) {
>>        return OPAL_SUCCESS;
>>    }
>> +    opal_initialized = true;
>> 
>>    /* initialize util code */
>>    if (OPAL_SUCCESS != (ret = opal_init_util(pargc, pargv))) {
>> 
>> Modified: trunk/orte/tools/orterun/orterun.c
>> ==============================================================================
>> --- trunk/orte/tools/orterun/orterun.c       (original)
>> +++ trunk/orte/tools/orterun/orterun.c       2011-07-08 02:43:19 EDT (Fri, 
>> 08 Jul 2011)
>> @@ -618,10 +618,6 @@
>>        ORTE_ERROR_LOG(rc);
>>        return rc;
>>    }
>> -    /* finalize the OPAL utils. As they are opened again from 
>> orte_init->opal_init
>> -     * we continue to have a reference count on them. So we have to 
>> finalize them twice...
>> -     */
>> -    opal_finalize_util();
>> 
>>    /* check for request to report uri */
>>    if (NULL != orterun_globals.report_uri) {
>> _______________________________________________
>> svn-full mailing list
>> svn-f...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to