Igor -

Sorry for the slow reply; I was on vacation for the last week and a half.

The patch doesn't look quite right to me.  If the cm PML is used, the spml
(or something else in the OSHMEM layer) is going to have to call add_procs
on the BML to initialize the procs arrays for the BTLs.

Brian

On 12/23/13 3:49 AM, "Igor Ivanov" <igor.iva...@itseez.com> wrote:

>Brian,
>
>Could you look at patch based on your suggestion. It resolves the issue
>with mca variable.
>
>Igor
>
>On 18.12.2013 01:48, Barrett, Brian W wrote:
>> The proposed solution at the bottom is wrong.  There aren't two
>>different
>> BMLs, there's one, and it lives in OMPI.
>>
>> The solution is to open the bml and btls in ompi_mpi_init and not in the
>> pmls.  I checked, and the bml will deal with add_procs being called
>> multiple times on the same proc, so just moving the framework open /
>>init
>> is sufficient.  This will also solve the MTL problem.
>>
>> Brian
>>
>> On 12/17/13 8:33 AM, "Joshua Ladd" <josh...@mellanox.com> wrote:
>>
>>> I believe Devendar Bureddy nailed the root cause. I am providing his
>>> excellent analysis below:
>>>
>> >From Devendar:
>>> with curiosity i looked at this issue. here's my 2 cents
>>> I think issue is because of BTL components is opened&closed
>>> twice(ompi_init, yoda) which leading to incorrect usage of var groups.
>>> The following sequence of events creating invalid memory
>>>
>>> 1) all openib component parameters registered in ompi_mpi_init
>>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init ->
>>> mca_base_framework_open -> mca_pml_base_open ..... mca_bml_base_open...
>>> -> btl_openib_component_register()
>>>
>>> *   for all string variables it allocated a memory block
>>>(var->mbv_storage
>>> = PTR)
>>>
>>> At this time a new var group id:114 (of parent group id: 112) is
>>>created
>>> for all openib component variables.
>>>
>>> 2) This var group is de-registered in ompi_mpi_init. It marks all
>>> variables as invalid. but, the group&vars is still exist
>>> main > start_pes> shmem_init -> oshmem_shmem_init ->
>>>mca_pml_base_select
>>> -> mca_base_components_close -> ... -> mca_bml_base_close ->
>>> mca_base_framework_close -> mca_base_var_group_deregister(groupid:
>>>114) *
>>> all string variables memory is deallocated ( set var->mbv_storage =
>>>NULL;)
>>>
>>> 3) because of step 2). btl_openib.so shared lib dlclosed
>>>
>>> 4) Now we are reopening openib in yoda and registering the openib
>>> variables again.
>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>> mca_base_framework_open -> mca_spml_base_open>
>>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... ->
>>> btl_openib_component_register -> register_variables()
>>>
>>> *   In register_variables(), var_find() finds this variable( from the
>>>same
>>> old group: 114) and reset the variables.
>>> *   For string variables, it allocated the buffers again (
>>> (var->mbv_storage = PTR)
>>> *   note that group:114 is not belongs to yoda component.
>>>
>>> 5) In yoda component close, it never finds above group(114) because
>>>this
>>> is not belongs to this component. So, do not call
>>> mca_base_var_group_deregister() again on the var group. string var
>>>memory
>>> is not deallocated.
>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>> mca_spml_base_select ->..> mca_spml_yoda_component_close ->
>>> mca_bml_base_close -> mca_base_var_group_find().
>>>
>>> 6) because of step 5), the btl_openib.so is dlclosed(). This step
>>> invalidates, all openib string vars memory ( var->mbv_storage = PTR)
>>> allocated in step 4)
>>>
>>> 7) in ompi_mpi_finalize(), it will loop through all vars and finalizes
>>> and deallocate the string var memory (var->mbv_storage = PTR)
>>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage = PTR
>>>is
>>> invalid at this stage and causing the SEGFAULT.
>>>
>>>
>>> This also explains why Dinar's patch, kostul_fix.patch
>>> (http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch),
>>> resolves the issue. His patch prevents you from finding the invalid
>>> already opened params.
>>> So, I see in a lot of these registration functions the signature has an
>>> entry for the project name, but now, NULL, is always passed. I see a
>>>note
>>> by Nathan in
>>>
>>> ../opal/mca/base/mca_base_var.c +1311
>>> {
>>> /* XXX -- component_update -- We will stash the project name in the
>>> component */
>>> return mca_base_var_register (NULL, component->mca_type_name,
>>>
>>>
>>> Seems knowing the project name, oshmem, would allow us to distinguish
>>> between the different BMLs.
>>>
>>> Nathan, please advise.
>>>
>>> Josh
>>>
>>>
>>> -----Original Message-----
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
>>>Hjelm
>>> Sent: Monday, December 16, 2013 12:44 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>
>>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote:
>>>> After speaking with Igor Ivanov about this this morning, he summarized
>>>> his findings as follows:
>>>>
>>>> 1. Valgrind comes up clean.
>>> Thats good to hear but unfortunate since this seems really like a
>>> stomping-on-memory problem.
>>>
>>>> 2. The issue is not reproduced with a static build.
>>> This is a red-herring. The variable itself contains garbage. The
>>> mbv_storage pointer looked like it was on the stack, the name was not
>>> valid, etc. Not sure how we got an mca_base_var_t into that state since
>>> the only time we touch anything in them is in mca_base_var_finalize.
>>>That
>>> functions cleans up all of the state to two calls to it should be
>>> harmless.
>>>
>>>> 3. A bisection study reveals that problems first appear after commit:
>>>> https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/base
>>>> /mca_base_var.c
>>> Possibly also a coincidence. That commit only 1) moves the group stuff
>>> into its own file, and 2) adds the mca_base_pvar interface. Its
>>>possible
>>> I messed something up in the rest of the code but unlikely. I will take
>>> another look though.
>>>
>>> -Nathan
>>>
>>>>
>>>> Josh
>>>>
>>>> -----Original Message-----
>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
>>>> Squyres (jsquyres)
>>>> Sent: Monday, December 16, 2013 12:15 PM
>>>> To: Open MPI Developers
>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>
>>>> It might be worthwhile to run this through valgrind and see if
>>>> something is being freed incorrectly...?
>>>>
>>>>
>>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
>>>>
>>>>> I took a look at the stacktraces last week and could not identify
>>>>> where the bug is. I will dig deeper this week and see if I can come
>>>> up with the correct fix.
>>>>> -Nathan
>>>>>
>>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote:
>>>>>>    Nathan,
>>>>>>    Could you please comment on the Igor`s observations?
>>>>>>    Thanks
>>>>>>
>>>>>>    On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov
>>>> <igor.iva...@itseez.com>
>>>>>>    wrote:
>>>>>>
>>>>>>      On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>>>>>>
>>>>>>        On Dec 4, 2013, at 2:52 AM, Igor Ivanov
>>>> <igor.iva...@itseez.com>
>>>>>>        wrote:
>>>>>>
>>>>>>          It is the first mca variable with type as string from
>>>> btl/openib as
>>>>>>          'device_param_files'. Actually you can disable it and get
>>>> failure on
>>>>>>          the second.
>>>>>>
>>>>>>          Description of case we see:
>>>>>>          1. openib mca variables are registered during startup as
>>>> stage at
>>>>>>          select component phase;
>>>>>>          2. but a winner is cm component and openib mca variables
>>>>>>are
>>>>>>          deregistered as part of mca group;
>>>>>>          3. mca variables are not removed from global mca array but
>>>> they
>>>>>>          marked as invalid and memory for string is freed;
>>>>>>          4. shmem needs openib for yoda and does bml initialization;
>>>>>>          5. openib mca variables are registered againusing light
>>>>>>mode
>>>> as
>>>>>>          searching itself in global array and refreshing their
>>>>>> fields again;
>>>>>>
>>>>>>        Can you explain what you mean by step 5?  I.e., what does
>>>> "using light
>>>>>>        mode" mean?  Is the openib component register function
>>>>>>invoked
>>>> again?
>>>>>>      It is correct, it is called twice. "light mode" means that
>>>>>>      mca_base_var_register() does not allocate mca variable object
>>>> again, it
>>>>>>      seeks this variable in global array and finding it updates
>>>> fields in
>>>>>>      mca_base_var_t structure (at least mbv_storage).
>>>>>>
>>>>>>          6. for unknown reason bml finalization does not clean these
>>>> vars as
>>>>>>          it is done in step 2;
>>>>>>          7. mca_btl_openib.so is unloaded;
>>>>>>          8. opal_finalize() destroys mca variables form global
>>>>>>array,
>>>>>>          observes openib`s variable, try destroy using non accessed
>>>>>> address;
>>>>>>
>>>>>>          So a code that is under discussion fixes step 6.
>>>>>>
>>>>>>        Nathan: it sounds like an MCA var (and entire group) is
>>>> registered,
>>>>>>        unregistered, and then registered again. Does the MCA var
>>>> system get
>>>>>>        confused here when it tries to unregister the group a 2nd
>>>>>>time?
>>>>>>
>>>>>>      Probably issue relates incorrect recognition if variable
>>>> valid/invalid
>>>>>>      during second call of mca_base_var_deregister().
>>>>>>
>>>>>>      _______________________________________________
>>>>>>      devel mailing list
>>>>>>      de...@open-mpi.org
>>>>>>      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> --
>>    Brian W. Barrett
>>    Scalable System Software Group
>>    Sandia National Laboratories
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>


--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories




Reply via email to