Unfortunately, it does not happen in my mtt runs but at customer cluster with no access.
If there is an agreement that flow in question is a result of internal bug it is better we caught it earlier and not let garbage in/out but use the power of community to trace it at other mtt setups as well. Does assert fit better than abort here? On Oct 7, 2014 4:38 PM, "Ralph Castain" <r...@open-mpi.org> wrote: > This may be me mis-communicating with Mike off list. I had suggested he > add this "feature" to help in catching a rare race condition in his MTT > runs. However, I had expected him to do it on his private branch, not > commit it to the main repo. > > I agree that I'm not sure what I think about it for the trunk. It is > indicative of a bug in the code, but if someone hits that bug at > scale....generating core files at scale can be really bad. > > > On Tue, Oct 7, 2014 at 5:54 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> I'm not sure how I feel about this commit: >> >> 1. It blindly ignores the "return" statement. I.e., if the intent for >> this commit was to kill the process, that "return" statement should have >> been deleted, too. >> >> 2. We clearly decided a long time ago that removing an item from a list >> from which it does not belong is NOT a fatal error. This commit is a >> fundamental change in behavior that really should have been RFC'ed (e.g., I >> RFC'ed the calloc-vs-malloc idea last week). >> >> I'm not saying that this is a bad change in core behavior, but I would >> have appreciated a little heads-up and a chance to think about it before it >> was made (I'm still not sure what I think about this). >> >> >> >> On Oct 7, 2014, at 7:09 AM, <git...@crest.iu.edu> <git...@crest.iu.edu> >> wrote: >> >> > This is an automated email from the git hooks/post-receive script. It >> was >> > generated because a ref change was pushed to the repository containing >> > the project "open-mpi/ompi". >> > >> > The branch, master has been updated >> > via 86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c (commit) >> > from cd48fbeec67f1a511a9cf5ce890fef6cc535ef60 (commit) >> > >> > Those revisions listed above that are new to this repository have >> > not appeared on any other notification email; so we list those >> > revisions in full, below. >> > >> > - Log ----------------------------------------------------------------- >> > >> https://github.com/open-mpi/ompi/commit/86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c >> > >> > commit 86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c >> > Author: Mike Dubman <mi...@mellanox.com> >> > Date: Tue Oct 7 14:07:41 2014 +0300 >> > >> > OPAL: drop dead with core on bad flow. rarely happens with >> helloworld on large scale. >> > >> > diff --git a/opal/class/opal_list.h b/opal/class/opal_list.h >> > index b66438e..bad4cbf 100644 >> > --- a/opal/class/opal_list.h >> > +++ b/opal/class/opal_list.h >> > @@ -486,6 +486,7 @@ static inline opal_list_item_t >> *opal_list_remove_item >> > if (!found) { >> > fprintf(stderr," Warning :: opal_list_remove_item - the item %p >> is not on the list %p \n",(void*) item, (void*) list); >> > fflush(stderr); >> > + abort(); >> > return (opal_list_item_t *)NULL; >> > } >> > >> > >> > >> > ----------------------------------------------------------------------- >> > >> > Summary of changes: >> > opal/class/opal_list.h | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > >> > hooks/post-receive >> > -- >> > open-mpi/ompi >> > _______________________________________________ >> > ompi-commits mailing list >> > ompi-comm...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/10/16019.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16020.php >