I found the problem - the issue is that assert on the convertor. MPI apps are 
setting that convertor, but not non-MPI apps, and so the field is NULL. Can we 
remove that assert?


On Aug 1, 2014, at 9:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I missed the fact that the app doesn't force it. But if this is indeed the 
> case then it is extremely weird that you are seing someone else releasing 
> your proc.
> 
> Regarding the destruction of the proc, the OPAL layer only does in a single 
> place, when the local proc is set (opal_proc_local_set). Moreover, it does 
> call OBJ_RETAIN when it does this, so the proc should not vanish without you 
> having control over it.
> 
> I looked at the code and noticed that it only crash in apps, the place where 
> the ORTE proc is not provided to the OPAL layer.
> 
>   George.
> 
> 
> 
> On Fri, Aug 1, 2014 at 12:12 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> On Aug 1, 2014, at 8:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> This commit brings two things. One if the renaming suggested by Gilles. The 
>> second one is forcing the ORTE process down on the OPAL. This doesn't fit 
>> the current design of the BTL move. The current design assumes that the 
>> local OPAL process is part of the local OMPI process.
> 
> Your statement isn't accurate - this commit sets the opal_proc_t for all 
> *non-MPI* processes. As the comment in ess_base_std_app.c notes, and the 
> commit message states, ORTE sets and controls the opal_proc_local structure 
> for all ORTE tools and non-MPI procs as (shockingly) they don't call 
> MPI_Init, and hence don't go thru ompi_proc_init, and were therefore leaving 
> the opal_proc_local structure set to the default "nothing" state. This caused 
> all the OPAL layer functions that access it to think nothing had been setup 
> yet.
> 
> My destruct issue is caused by the OPAL layer destructing the object, which 
> seems odd to me but <shrug>
> 
>> 
>>   George.
>> 
>> PS: If it doesn't break lose everywhere is because the OMPI layer reset it's 
>> own process after the RTE (which explain why Ralph notice that his proc has 
>> been OBJ_DESTRUCT).
>> 
>> 
>> 
>> On Fri, Aug 1, 2014 at 10:44 AM, <svn-commit-mai...@open-mpi.org> wrote:
>> Author: rhc (Ralph Castain)
>> Date: 2014-08-01 10:44:11 EDT (Fri, 01 Aug 2014)
>> New Revision: 32398
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32398
>> 
>> Log:
>> Some more cleanups. Remove direct references to ORTE by changing 
>> OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, 
>> orted, tools) set the OPAL proc structure fields so OPAL knows what is going 
>> on and uses the correct print functions (still need to fix the problem for 
>> non-MPI apps). Properly return uint32_t from the opal utilities instead of 
>> int32_t as that is what the ORTE process name fields contain.
>> 
>> Thanks to Gilles for pointing out some of the discrepancies.
>> 
>> Text files modified:
>>    trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c |     2
>>    trunk/ompi/mca/coll/hierarch/coll_hierarch.c        |     2
>>    trunk/ompi/mca/coll/sm/coll_sm_module.c             |     6 ++--
>>    trunk/ompi/mca/dpm/orte/dpm_orte.c                  |    10 ++++----
>>    trunk/ompi/mca/pml/bfo/pml_bfo_failover.c           |     6 ++--
>>    trunk/ompi/mca/rte/orte/rte_orte.h                  |     2
>>    trunk/ompi/proc/proc.c                              |    14 ++++++------
>>    trunk/ompi/runtime/ompi_mpi_abort.c                 |     4 +-
>>    trunk/ompi/runtime/ompi_mpi_init.c                  |     4 +-
>>    trunk/opal/util/proc.c                              |     9 +++----
>>    trunk/opal/util/proc.h                              |     4 +-
>>    trunk/orte/mca/ess/base/ess_base_std_orted.c        |     9 ++++++++
>>    trunk/orte/mca/ess/base/ess_base_std_tool.c         |     9 ++++++++
>>    trunk/orte/mca/ess/hnp/ess_hnp_module.c             |     8 +++++++
>>    trunk/orte/runtime/orte_init.c                      |    42 
>> ++++++++++++++++++++++++++++++++++++++++
>>    trunk/orte/util/proc_info.c                         |     6 +++++
>>    trunk/orte/util/proc_info.h                         |     4 ++
>>    17 files changed, 108 insertions(+), 33 deletions(-)
>> 
>> 
>> Diff not shown due to size (21547 bytes).
>> To see the diff, run the following command:
>> 
>>         svn diff -r 32397:32398 --no-diff-deleted
>> 
>> _______________________________________________
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15456.php
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15457.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15458.php

Reply via email to