I found the problem - the issue is that assert on the convertor. MPI apps are setting that convertor, but not non-MPI apps, and so the field is NULL. Can we remove that assert?
On Aug 1, 2014, at 9:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > I missed the fact that the app doesn't force it. But if this is indeed the > case then it is extremely weird that you are seing someone else releasing > your proc. > > Regarding the destruction of the proc, the OPAL layer only does in a single > place, when the local proc is set (opal_proc_local_set). Moreover, it does > call OBJ_RETAIN when it does this, so the proc should not vanish without you > having control over it. > > I looked at the code and noticed that it only crash in apps, the place where > the ORTE proc is not provided to the OPAL layer. > > George. > > > > On Fri, Aug 1, 2014 at 12:12 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Aug 1, 2014, at 8:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> This commit brings two things. One if the renaming suggested by Gilles. The >> second one is forcing the ORTE process down on the OPAL. This doesn't fit >> the current design of the BTL move. The current design assumes that the >> local OPAL process is part of the local OMPI process. > > Your statement isn't accurate - this commit sets the opal_proc_t for all > *non-MPI* processes. As the comment in ess_base_std_app.c notes, and the > commit message states, ORTE sets and controls the opal_proc_local structure > for all ORTE tools and non-MPI procs as (shockingly) they don't call > MPI_Init, and hence don't go thru ompi_proc_init, and were therefore leaving > the opal_proc_local structure set to the default "nothing" state. This caused > all the OPAL layer functions that access it to think nothing had been setup > yet. > > My destruct issue is caused by the OPAL layer destructing the object, which > seems odd to me but <shrug> > >> >> George. >> >> PS: If it doesn't break lose everywhere is because the OMPI layer reset it's >> own process after the RTE (which explain why Ralph notice that his proc has >> been OBJ_DESTRUCT). >> >> >> >> On Fri, Aug 1, 2014 at 10:44 AM, <svn-commit-mai...@open-mpi.org> wrote: >> Author: rhc (Ralph Castain) >> Date: 2014-08-01 10:44:11 EDT (Fri, 01 Aug 2014) >> New Revision: 32398 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/32398 >> >> Log: >> Some more cleanups. Remove direct references to ORTE by changing >> OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, >> orted, tools) set the OPAL proc structure fields so OPAL knows what is going >> on and uses the correct print functions (still need to fix the problem for >> non-MPI apps). Properly return uint32_t from the opal utilities instead of >> int32_t as that is what the ORTE process name fields contain. >> >> Thanks to Gilles for pointing out some of the discrepancies. >> >> Text files modified: >> trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c | 2 >> trunk/ompi/mca/coll/hierarch/coll_hierarch.c | 2 >> trunk/ompi/mca/coll/sm/coll_sm_module.c | 6 ++-- >> trunk/ompi/mca/dpm/orte/dpm_orte.c | 10 ++++---- >> trunk/ompi/mca/pml/bfo/pml_bfo_failover.c | 6 ++-- >> trunk/ompi/mca/rte/orte/rte_orte.h | 2 >> trunk/ompi/proc/proc.c | 14 ++++++------ >> trunk/ompi/runtime/ompi_mpi_abort.c | 4 +- >> trunk/ompi/runtime/ompi_mpi_init.c | 4 +- >> trunk/opal/util/proc.c | 9 +++---- >> trunk/opal/util/proc.h | 4 +- >> trunk/orte/mca/ess/base/ess_base_std_orted.c | 9 ++++++++ >> trunk/orte/mca/ess/base/ess_base_std_tool.c | 9 ++++++++ >> trunk/orte/mca/ess/hnp/ess_hnp_module.c | 8 +++++++ >> trunk/orte/runtime/orte_init.c | 42 >> ++++++++++++++++++++++++++++++++++++++++ >> trunk/orte/util/proc_info.c | 6 +++++ >> trunk/orte/util/proc_info.h | 4 ++ >> 17 files changed, 108 insertions(+), 33 deletions(-) >> >> >> Diff not shown due to size (21547 bytes). >> To see the diff, run the following command: >> >> svn diff -r 32397:32398 --no-diff-deleted >> >> _______________________________________________ >> svn mailing list >> s...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/svn >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15456.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15457.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15458.php