Igor - Sorry for the slow reply; I was on vacation for the last week and a half.
The patch doesn't look quite right to me. If the cm PML is used, the spml (or something else in the OSHMEM layer) is going to have to call add_procs on the BML to initialize the procs arrays for the BTLs. Brian On 12/23/13 3:49 AM, "Igor Ivanov" <igor.iva...@itseez.com> wrote: >Brian, > >Could you look at patch based on your suggestion. It resolves the issue >with mca variable. > >Igor > >On 18.12.2013 01:48, Barrett, Brian W wrote: >> The proposed solution at the bottom is wrong. There aren't two >>different >> BMLs, there's one, and it lives in OMPI. >> >> The solution is to open the bml and btls in ompi_mpi_init and not in the >> pmls. I checked, and the bml will deal with add_procs being called >> multiple times on the same proc, so just moving the framework open / >>init >> is sufficient. This will also solve the MTL problem. >> >> Brian >> >> On 12/17/13 8:33 AM, "Joshua Ladd" <josh...@mellanox.com> wrote: >> >>> I believe Devendar Bureddy nailed the root cause. I am providing his >>> excellent analysis below: >>> >> >From Devendar: >>> with curiosity i looked at this issue. here's my 2 cents >>> I think issue is because of BTL components is opened&closed >>> twice(ompi_init, yoda) which leading to incorrect usage of var groups. >>> The following sequence of events creating invalid memory >>> >>> 1) all openib component parameters registered in ompi_mpi_init >>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init -> >>> mca_base_framework_open -> mca_pml_base_open ..... mca_bml_base_open... >>> -> btl_openib_component_register() >>> >>> * for all string variables it allocated a memory block >>>(var->mbv_storage >>> = PTR) >>> >>> At this time a new var group id:114 (of parent group id: 112) is >>>created >>> for all openib component variables. >>> >>> 2) This var group is de-registered in ompi_mpi_init. It marks all >>> variables as invalid. but, the group&vars is still exist >>> main > start_pes> shmem_init -> oshmem_shmem_init -> >>>mca_pml_base_select >>> -> mca_base_components_close -> ... -> mca_bml_base_close -> >>> mca_base_framework_close -> mca_base_var_group_deregister(groupid: >>>114) * >>> all string variables memory is deallocated ( set var->mbv_storage = >>>NULL;) >>> >>> 3) because of step 2). btl_openib.so shared lib dlclosed >>> >>> 4) Now we are reopening openib in yoda and registering the openib >>> variables again. >>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init -> >>> mca_base_framework_open -> mca_spml_base_open> >>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... -> >>> btl_openib_component_register -> register_variables() >>> >>> * In register_variables(), var_find() finds this variable( from the >>>same >>> old group: 114) and reset the variables. >>> * For string variables, it allocated the buffers again ( >>> (var->mbv_storage = PTR) >>> * note that group:114 is not belongs to yoda component. >>> >>> 5) In yoda component close, it never finds above group(114) because >>>this >>> is not belongs to this component. So, do not call >>> mca_base_var_group_deregister() again on the var group. string var >>>memory >>> is not deallocated. >>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init -> >>> mca_spml_base_select ->..> mca_spml_yoda_component_close -> >>> mca_bml_base_close -> mca_base_var_group_find(). >>> >>> 6) because of step 5), the btl_openib.so is dlclosed(). This step >>> invalidates, all openib string vars memory ( var->mbv_storage = PTR) >>> allocated in step 4) >>> >>> 7) in ompi_mpi_finalize(), it will loop through all vars and finalizes >>> and deallocate the string var memory (var->mbv_storage = PTR) >>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage = PTR >>>is >>> invalid at this stage and causing the SEGFAULT. >>> >>> >>> This also explains why Dinar's patch, kostul_fix.patch >>> (http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch), >>> resolves the issue. His patch prevents you from finding the invalid >>> already opened params. >>> So, I see in a lot of these registration functions the signature has an >>> entry for the project name, but now, NULL, is always passed. I see a >>>note >>> by Nathan in >>> >>> ../opal/mca/base/mca_base_var.c +1311 >>> { >>> /* XXX -- component_update -- We will stash the project name in the >>> component */ >>> return mca_base_var_register (NULL, component->mca_type_name, >>> >>> >>> Seems knowing the project name, oshmem, would allow us to distinguish >>> between the different BMLs. >>> >>> Nathan, please advise. >>> >>> Josh >>> >>> >>> -----Original Message----- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan >>>Hjelm >>> Sent: Monday, December 16, 2013 12:44 PM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] bug in mca framework? >>> >>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote: >>>> After speaking with Igor Ivanov about this this morning, he summarized >>>> his findings as follows: >>>> >>>> 1. Valgrind comes up clean. >>> Thats good to hear but unfortunate since this seems really like a >>> stomping-on-memory problem. >>> >>>> 2. The issue is not reproduced with a static build. >>> This is a red-herring. The variable itself contains garbage. The >>> mbv_storage pointer looked like it was on the stack, the name was not >>> valid, etc. Not sure how we got an mca_base_var_t into that state since >>> the only time we touch anything in them is in mca_base_var_finalize. >>>That >>> functions cleans up all of the state to two calls to it should be >>> harmless. >>> >>>> 3. A bisection study reveals that problems first appear after commit: >>>> https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/base >>>> /mca_base_var.c >>> Possibly also a coincidence. That commit only 1) moves the group stuff >>> into its own file, and 2) adds the mca_base_pvar interface. Its >>>possible >>> I messed something up in the rest of the code but unlikely. I will take >>> another look though. >>> >>> -Nathan >>> >>>> >>>> Josh >>>> >>>> -----Original Message----- >>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff >>>> Squyres (jsquyres) >>>> Sent: Monday, December 16, 2013 12:15 PM >>>> To: Open MPI Developers >>>> Subject: Re: [OMPI devel] bug in mca framework? >>>> >>>> It might be worthwhile to run this through valgrind and see if >>>> something is being freed incorrectly...? >>>> >>>> >>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hje...@lanl.gov> wrote: >>>> >>>>> I took a look at the stacktraces last week and could not identify >>>>> where the bug is. I will dig deeper this week and see if I can come >>>> up with the correct fix. >>>>> -Nathan >>>>> >>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote: >>>>>> Nathan, >>>>>> Could you please comment on the Igor`s observations? >>>>>> Thanks >>>>>> >>>>>> On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov >>>> <igor.iva...@itseez.com> >>>>>> wrote: >>>>>> >>>>>> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote: >>>>>> >>>>>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov >>>> <igor.iva...@itseez.com> >>>>>> wrote: >>>>>> >>>>>> It is the first mca variable with type as string from >>>> btl/openib as >>>>>> 'device_param_files'. Actually you can disable it and get >>>> failure on >>>>>> the second. >>>>>> >>>>>> Description of case we see: >>>>>> 1. openib mca variables are registered during startup as >>>> stage at >>>>>> select component phase; >>>>>> 2. but a winner is cm component and openib mca variables >>>>>>are >>>>>> deregistered as part of mca group; >>>>>> 3. mca variables are not removed from global mca array but >>>> they >>>>>> marked as invalid and memory for string is freed; >>>>>> 4. shmem needs openib for yoda and does bml initialization; >>>>>> 5. openib mca variables are registered againusing light >>>>>>mode >>>> as >>>>>> searching itself in global array and refreshing their >>>>>> fields again; >>>>>> >>>>>> Can you explain what you mean by step 5? I.e., what does >>>> "using light >>>>>> mode" mean? Is the openib component register function >>>>>>invoked >>>> again? >>>>>> It is correct, it is called twice. "light mode" means that >>>>>> mca_base_var_register() does not allocate mca variable object >>>> again, it >>>>>> seeks this variable in global array and finding it updates >>>> fields in >>>>>> mca_base_var_t structure (at least mbv_storage). >>>>>> >>>>>> 6. for unknown reason bml finalization does not clean these >>>> vars as >>>>>> it is done in step 2; >>>>>> 7. mca_btl_openib.so is unloaded; >>>>>> 8. opal_finalize() destroys mca variables form global >>>>>>array, >>>>>> observes openib`s variable, try destroy using non accessed >>>>>> address; >>>>>> >>>>>> So a code that is under discussion fixes step 6. >>>>>> >>>>>> Nathan: it sounds like an MCA var (and entire group) is >>>> registered, >>>>>> unregistered, and then registered again. Does the MCA var >>>> system get >>>>>> confused here when it tries to unregister the group a 2nd >>>>>>time? >>>>>> >>>>>> Probably issue relates incorrect recognition if variable >>>> valid/invalid >>>>>> during second call of mca_base_var_deregister(). >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> -- >> Brian W. Barrett >> Scalable System Software Group >> Sandia National Laboratories >> >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > -- Brian W. Barrett Scalable System Software Group Sandia National Laboratories