On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
On Dec 4, 2013, at 2:52 AM, Igor Ivanov <igor.iva...@itseez.com> wrote:

It is the first mca variable with type as string from btl/openib as 
'device_param_files'. Actually you can disable it and get failure on the second.

Description of case we see:
1. openib mca variables are registered during startup as stage at select 
component phase;
2. but a winner is cm component and openib mca variables are deregistered as 
part of mca group;
3. mca variables are not removed from global mca array but they marked as 
invalid and memory for string is freed;
4. shmem needs openib for yoda and does bml initialization;
5. openib mca variables are registered againusing light mode as searching 
itself in global array and refreshing their fields again;
Can you explain what you mean by step 5?  I.e., what does "using light mode" 
mean?  Is the openib component register function invoked again?
It is correct, it is called twice. "light mode" means that mca_base_var_register() does not allocate mca variable object again, it seeks this variable in global array and finding it updates fields in mca_base_var_t structure (at least mbv_storage).

6. for unknown reason bml finalization does not clean these vars as it is done 
in step 2;
7. mca_btl_openib.so is unloaded;
8. opal_finalize() destroys mca variables form global array, observes openib`s 
variable, try destroy using non accessed address;

So a code that is under discussion fixes step 6.
Nathan: it sounds like an MCA var (and entire group) is registered, 
unregistered, and then registered again. Does the MCA var system get confused 
here when it tries to unregister the group a 2nd time?
Probably issue relates incorrect recognition if variable valid/invalid during second call of mca_base_var_deregister().

Reply via email to