Nathan, Could you please comment on the Igor`s observations? Thanks
On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov <igor.iva...@itseez.com> wrote: > On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote: > >> On Dec 4, 2013, at 2:52 AM, Igor Ivanov <igor.iva...@itseez.com> wrote: >> >> It is the first mca variable with type as string from btl/openib as >>> 'device_param_files'. Actually you can disable it and get failure on the >>> second. >>> >>> Description of case we see: >>> 1. openib mca variables are registered during startup as stage at select >>> component phase; >>> 2. but a winner is cm component and openib mca variables are >>> deregistered as part of mca group; >>> 3. mca variables are not removed from global mca array but they marked >>> as invalid and memory for string is freed; >>> 4. shmem needs openib for yoda and does bml initialization; >>> 5. openib mca variables are registered againusing light mode as >>> searching itself in global array and refreshing their fields again; >>> >> Can you explain what you mean by step 5? I.e., what does "using light >> mode" mean? Is the openib component register function invoked again? >> > It is correct, it is called twice. "light mode" means that > mca_base_var_register() does not allocate mca variable object again, it > seeks this variable in global array and finding it updates fields in > mca_base_var_t structure (at least mbv_storage). > > >> 6. for unknown reason bml finalization does not clean these vars as it >>> is done in step 2; >>> 7. mca_btl_openib.so is unloaded; >>> 8. opal_finalize() destroys mca variables form global array, observes >>> openib`s variable, try destroy using non accessed address; >>> >>> So a code that is under discussion fixes step 6. >>> >> Nathan: it sounds like an MCA var (and entire group) is registered, >> unregistered, and then registered again. Does the MCA var system get >> confused here when it tries to unregister the group a 2nd time? >> > Probably issue relates incorrect recognition if variable valid/invalid > during second call of mca_base_var_deregister(). > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >