I'm not sure that was created by the commit you cite, but it may have been exposed by it. Either way, the patch is correct - the TCP component will NULL the entry in the hash table, but that doesn't remove the key and so the hash_table lookup request will return "success" with a NULL pointer.
On Jun 8, 2014, at 10:24 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Folks, > > several mtt tests (ompi-trunk r31963) failed (SIGSEGV in mpirun) with a > similar stack trace. > > For example, you can refer to : > http://mtt.open-mpi.org/index.php?do_redir=2199 > > the issue is not related whatsoever to the init_thread_serialized test > (other tests failed with similar symptoms) > > so far i could find that : > - the issue is intermittent and can be hard to reproduce (1 failure over 1000 > runs) > - per the mtt logs, it seems this is quite a recent failure > - a necessary condition is that MPI tasks exit with a non zero status after > having called MPI_Finalize() > - the crash occurs is in orte/mca/oob/base/oob_base_frame.c at line 89 when > invoking > OBJ_RELEASE(value) ; > in some rare cases, value is NULL which causes the crash. > - though i cannot incriminate one changeset in particular, i highly suspect > the changes that were made in order to address the issue(s) discussed at > http://www.open-mpi.org/community/lists/devel/2014/05/14908.php > > the attached a patch that works around this issue. > i did not commit it because i consider this as a work around and not as a fix > : > the root cause might be a tricky race condition ("abort" after MPI_Finalize). > > > as a side note, here is the definition of OBJ_RELEASE > (opal/class/opal_object.h) > #if OPAL_ENABLE_DEBUG > #define OBJ_RELEASE(object) \ > do { \ > assert(NULL != ((opal_object_t *) (object))->obj_class); \ > assert(OPAL_OBJ_MAGIC_ID == ((opal_object_t *) > (object))->obj_magic_id); \ > } while (0) > ... > #else > ... > > should we add the following assert at the beginning ? > assert(NULL != object); > > > Thanks in advance for your comments, > > Gilles > <oob.patch>_______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/14994.php