If a process is using the Portals 4 MTL and calls shmem_init, the BTLS will be initialized properly, but as of right now, no one will call add_procs() on the BML (which calls add_procs() on the BTLs). So the first shmem communication will fail, because the proc lookup will fail inside the BTL. If the MPI layer doesn't call add_procs(), someone else has to. In this case, that someone else is the OpenSHMEM layer.
Brian On 1/15/14 7:45 AM, "Igor Ivanov" <igor.iva...@itseez.com> wrote: >Brian, > >Sorry for slow reaction. >I am not sure I understand your concern. Could you please make it >clearer and review modified patch (I have figured out issue in my >previous patch as absence of complete btl initialization in case PML >components different from bfo and ob1 needed for OSHMEM.) > >Igor > >On 03.01.2014 00:04, Barrett, Brian W wrote: >> Igor - >> >> Sorry for the slow reply; I was on vacation for the last week and a >>half. >> >> The patch doesn't look quite right to me. If the cm PML is used, the >>spml >> (or something else in the OSHMEM layer) is going to have to call >>add_procs >> on the BML to initialize the procs arrays for the BTLs. >> >> Brian >> >> On 12/23/13 3:49 AM, "Igor Ivanov" <igor.iva...@itseez.com> wrote: >> >>> Brian, >>> >>> Could you look at patch based on your suggestion. It resolves the issue >>> with mca variable. >>> >>> Igor >>> >>> On 18.12.2013 01:48, Barrett, Brian W wrote: >>>> The proposed solution at the bottom is wrong. There aren't two >>>> different >>>> BMLs, there's one, and it lives in OMPI. >>>> >>>> The solution is to open the bml and btls in ompi_mpi_init and not in >>>>the >>>> pmls. I checked, and the bml will deal with add_procs being called >>>> multiple times on the same proc, so just moving the framework open / >>>> init >>>> is sufficient. This will also solve the MTL problem. >>>> >>>> Brian >>>> >>>> On 12/17/13 8:33 AM, "Joshua Ladd" <josh...@mellanox.com> wrote: >>>> >>>>> I believe Devendar Bureddy nailed the root cause. I am providing his >>>>> excellent analysis below: >>>>> >>>> >From Devendar: >>>>> with curiosity i looked at this issue. here's my 2 cents >>>>> I think issue is because of BTL components is opened&closed >>>>> twice(ompi_init, yoda) which leading to incorrect usage of var >>>>>groups. >>>>> The following sequence of events creating invalid memory >>>>> >>>>> 1) all openib component parameters registered in ompi_mpi_init >>>>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init -> >>>>> mca_base_framework_open -> mca_pml_base_open ..... >>>>>mca_bml_base_open... >>>>> -> btl_openib_component_register() >>>>> >>>>> * for all string variables it allocated a memory block >>>>> (var->mbv_storage >>>>> = PTR) >>>>> >>>>> At this time a new var group id:114 (of parent group id: 112) is >>>>> created >>>>> for all openib component variables. >>>>> >>>>> 2) This var group is de-registered in ompi_mpi_init. It marks all >>>>> variables as invalid. but, the group&vars is still exist >>>>> main > start_pes> shmem_init -> oshmem_shmem_init -> >>>>> mca_pml_base_select >>>>> -> mca_base_components_close -> ... -> mca_bml_base_close -> >>>>> mca_base_framework_close -> mca_base_var_group_deregister(groupid: >>>>> 114) * >>>>> all string variables memory is deallocated ( set var->mbv_storage = >>>>> NULL;) >>>>> >>>>> 3) because of step 2). btl_openib.so shared lib dlclosed >>>>> >>>>> 4) Now we are reopening openib in yoda and registering the openib >>>>> variables again. >>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init -> >>>>> mca_base_framework_open -> mca_spml_base_open> >>>>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... -> >>>>> btl_openib_component_register -> register_variables() >>>>> >>>>> * In register_variables(), var_find() finds this variable( from the >>>>> same >>>>> old group: 114) and reset the variables. >>>>> * For string variables, it allocated the buffers again ( >>>>> (var->mbv_storage = PTR) >>>>> * note that group:114 is not belongs to yoda component. >>>>> >>>>> 5) In yoda component close, it never finds above group(114) because >>>>> this >>>>> is not belongs to this component. So, do not call >>>>> mca_base_var_group_deregister() again on the var group. string var >>>>> memory >>>>> is not deallocated. >>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init -> >>>>> mca_spml_base_select ->..> mca_spml_yoda_component_close -> >>>>> mca_bml_base_close -> mca_base_var_group_find(). >>>>> >>>>> 6) because of step 5), the btl_openib.so is dlclosed(). This step >>>>> invalidates, all openib string vars memory ( var->mbv_storage = PTR) >>>>> allocated in step 4) >>>>> >>>>> 7) in ompi_mpi_finalize(), it will loop through all vars and >>>>>finalizes >>>>> and deallocate the string var memory (var->mbv_storage = PTR) >>>>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage = >>>>>PTR >>>>> is >>>>> invalid at this stage and causing the SEGFAULT. >>>>> >>>>> >>>>> This also explains why Dinar's patch, kostul_fix.patch >>>>> >>>>>(http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch), >>>>> resolves the issue. His patch prevents you from finding the invalid >>>>> already opened params. >>>>> So, I see in a lot of these registration functions the signature has >>>>>an >>>>> entry for the project name, but now, NULL, is always passed. I see a >>>>> note >>>>> by Nathan in >>>>> >>>>> ../opal/mca/base/mca_base_var.c +1311 >>>>> { >>>>> /* XXX -- component_update -- We will stash the project name in the >>>>> component */ >>>>> return mca_base_var_register (NULL, component->mca_type_name, >>>>> >>>>> >>>>> Seems knowing the project name, oshmem, would allow us to distinguish >>>>> between the different BMLs. >>>>> >>>>> Nathan, please advise. >>>>> >>>>> Josh >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan >>>>> Hjelm >>>>> Sent: Monday, December 16, 2013 12:44 PM >>>>> To: Open MPI Developers >>>>> Subject: Re: [OMPI devel] bug in mca framework? >>>>> >>>>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote: >>>>>> After speaking with Igor Ivanov about this this morning, he >>>>>>summarized >>>>>> his findings as follows: >>>>>> >>>>>> 1. Valgrind comes up clean. >>>>> Thats good to hear but unfortunate since this seems really like a >>>>> stomping-on-memory problem. >>>>> >>>>>> 2. The issue is not reproduced with a static build. >>>>> This is a red-herring. The variable itself contains garbage. The >>>>> mbv_storage pointer looked like it was on the stack, the name was not >>>>> valid, etc. Not sure how we got an mca_base_var_t into that state >>>>>since >>>>> the only time we touch anything in them is in mca_base_var_finalize. >>>>> That >>>>> functions cleans up all of the state to two calls to it should be >>>>> harmless. >>>>> >>>>>> 3. A bisection study reveals that problems first appear after >>>>>>commit: >>>>>> >>>>>>https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/bas >>>>>>e >>>>>> /mca_base_var.c >>>>> Possibly also a coincidence. That commit only 1) moves the group >>>>>stuff >>>>> into its own file, and 2) adds the mca_base_pvar interface. Its >>>>> possible >>>>> I messed something up in the rest of the code but unlikely. I will >>>>>take >>>>> another look though. >>>>> >>>>> -Nathan >>>>> >>>>>> Josh >>>>>> >>>>>> -----Original Message----- >>>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff >>>>>> Squyres (jsquyres) >>>>>> Sent: Monday, December 16, 2013 12:15 PM >>>>>> To: Open MPI Developers >>>>>> Subject: Re: [OMPI devel] bug in mca framework? >>>>>> >>>>>> It might be worthwhile to run this through valgrind and see if >>>>>> something is being freed incorrectly...? >>>>>> >>>>>> >>>>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hje...@lanl.gov> wrote: >>>>>> >>>>>>> I took a look at the stacktraces last week and could not identify >>>>>>> where the bug is. I will dig deeper this week and see if I can come >>>>>> up with the correct fix. >>>>>>> -Nathan >>>>>>> >>>>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote: >>>>>>>> Nathan, >>>>>>>> Could you please comment on the Igor`s observations? >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov >>>>>> <igor.iva...@itseez.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote: >>>>>>>> >>>>>>>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov >>>>>> <igor.iva...@itseez.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> It is the first mca variable with type as string from >>>>>> btl/openib as >>>>>>>> 'device_param_files'. Actually you can disable it and >>>>>>>>get >>>>>> failure on >>>>>>>> the second. >>>>>>>> >>>>>>>> Description of case we see: >>>>>>>> 1. openib mca variables are registered during startup as >>>>>> stage at >>>>>>>> select component phase; >>>>>>>> 2. but a winner is cm component and openib mca variables >>>>>>>> are >>>>>>>> deregistered as part of mca group; >>>>>>>> 3. mca variables are not removed from global mca array >>>>>>>>but >>>>>> they >>>>>>>> marked as invalid and memory for string is freed; >>>>>>>> 4. shmem needs openib for yoda and does bml >>>>>>>>initialization; >>>>>>>> 5. openib mca variables are registered againusing light >>>>>>>> mode >>>>>> as >>>>>>>> searching itself in global array and refreshing their >>>>>>>> fields again; >>>>>>>> >>>>>>>> Can you explain what you mean by step 5? I.e., what does >>>>>> "using light >>>>>>>> mode" mean? Is the openib component register function >>>>>>>> invoked >>>>>> again? >>>>>>>> It is correct, it is called twice. "light mode" means that >>>>>>>> mca_base_var_register() does not allocate mca variable >>>>>>>>object >>>>>> again, it >>>>>>>> seeks this variable in global array and finding it updates >>>>>> fields in >>>>>>>> mca_base_var_t structure (at least mbv_storage). >>>>>>>> >>>>>>>> 6. for unknown reason bml finalization does not clean >>>>>>>>these >>>>>> vars as >>>>>>>> it is done in step 2; >>>>>>>> 7. mca_btl_openib.so is unloaded; >>>>>>>> 8. opal_finalize() destroys mca variables form global >>>>>>>> array, >>>>>>>> observes openib`s variable, try destroy using non >>>>>>>>accessed >>>>>>>> address; >>>>>>>> >>>>>>>> So a code that is under discussion fixes step 6. >>>>>>>> >>>>>>>> Nathan: it sounds like an MCA var (and entire group) is >>>>>> registered, >>>>>>>> unregistered, and then registered again. Does the MCA var >>>>>> system get >>>>>>>> confused here when it tries to unregister the group a 2nd >>>>>>>> time? >>>>>>>> >>>>>>>> Probably issue relates incorrect recognition if variable >>>>>> valid/invalid >>>>>>>> during second call of mca_base_var_deregister(). >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> -- >>>> Brian W. Barrett >>>> Scalable System Software Group >>>> Sandia National Laboratories >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >> >> -- >> Brian W. Barrett >> Scalable System Software Group >> Sandia National Laboratories >> >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > -- Brian W. Barrett Scalable System Software Group Sandia National Laboratories