After speaking with Igor Ivanov about this this morning, he summarized his 
findings as follows:

1. Valgrind comes up clean.
2. The issue is not reproduced with a static build.
3. A bisection study reveals that problems first appear after commit: 
https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/base/mca_base_var.c


Josh

-----Original Message-----
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Monday, December 16, 2013 12:15 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] bug in mca framework?

It might be worthwhile to run this through valgrind and see if something is 
being freed incorrectly...?


On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hje...@lanl.gov> wrote:

> I took a look at the stacktraces last week and could not identify 
> where the bug is. I will dig deeper this week and see if I can come up with 
> the correct fix.
> 
> -Nathan
> 
> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote:
>>   Nathan,
>>   Could you please comment on the Igor`s observations?
>>   Thanks
>> 
>>   On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov <igor.iva...@itseez.com>
>>   wrote:
>> 
>>     On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>> 
>>       On Dec 4, 2013, at 2:52 AM, Igor Ivanov <igor.iva...@itseez.com>
>>       wrote:
>> 
>>         It is the first mca variable with type as string from btl/openib as
>>         'device_param_files'. Actually you can disable it and get failure on
>>         the second.
>> 
>>         Description of case we see:
>>         1. openib mca variables are registered during startup as stage at
>>         select component phase;
>>         2. but a winner is cm component and openib mca variables are
>>         deregistered as part of mca group;
>>         3. mca variables are not removed from global mca array but they
>>         marked as invalid and memory for string is freed;
>>         4. shmem needs openib for yoda and does bml initialization;
>>         5. openib mca variables are registered againusing light mode as
>>         searching itself in global array and refreshing their fields 
>> again;
>> 
>>       Can you explain what you mean by step 5?  I.e., what does "using light
>>       mode" mean?  Is the openib component register function invoked again?
>> 
>>     It is correct, it is called twice. "light mode" means that
>>     mca_base_var_register() does not allocate mca variable object again, it
>>     seeks this variable in global array and finding it updates fields in
>>     mca_base_var_t structure (at least mbv_storage).
>> 
>>         6. for unknown reason bml finalization does not clean these vars as
>>         it is done in step 2;
>>         7. mca_btl_openib.so is unloaded;
>>         8. opal_finalize() destroys mca variables form global array,
>>         observes openib`s variable, try destroy using non accessed 
>> address;
>> 
>>         So a code that is under discussion fixes step 6.
>> 
>>       Nathan: it sounds like an MCA var (and entire group) is registered,
>>       unregistered, and then registered again. Does the MCA var system get
>>       confused here when it tries to unregister the group a 2nd time?
>> 
>>     Probably issue relates incorrect recognition if variable valid/invalid
>>     during second call of mca_base_var_deregister().
>> 
>>     _______________________________________________
>>     devel mailing list
>>     de...@open-mpi.org
>>     http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to