Unfortunately, it does not happen in my mtt runs but at customer cluster
with no access.

If there is an agreement that flow in question is a result of internal bug
it is better we caught it earlier and not let garbage in/out but use the
power of community to trace it at other mtt setups as well.

Does assert fit better than abort here?
On Oct 7, 2014 4:38 PM, "Ralph Castain" <r...@open-mpi.org> wrote:

> This may be me mis-communicating with Mike off list. I had suggested he
> add this "feature" to help in catching a rare race condition in his MTT
> runs. However, I had expected him to do it on his private branch, not
> commit it to the main repo.
>
> I agree that I'm not sure what I think about it for the trunk. It is
> indicative of a bug in the code, but if someone hits that bug at
> scale....generating core files at scale can be really bad.
>
>
> On Tue, Oct 7, 2014 at 5:54 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> I'm not sure how I feel about this commit:
>>
>> 1. It blindly ignores the "return" statement.  I.e., if the intent for
>> this commit was to kill the process, that "return" statement should have
>> been deleted, too.
>>
>> 2. We clearly decided a long time ago that removing an item from a list
>> from which it does not belong is NOT a fatal error.  This commit is a
>> fundamental change in behavior that really should have been RFC'ed (e.g., I
>> RFC'ed the calloc-vs-malloc idea last week).
>>
>> I'm not saying that this is a bad change in core behavior, but I would
>> have appreciated a little heads-up and a chance to think about it before it
>> was made (I'm still not sure what I think about this).
>>
>>
>>
>> On Oct 7, 2014, at 7:09 AM, <git...@crest.iu.edu> <git...@crest.iu.edu>
>> wrote:
>>
>> > This is an automated email from the git hooks/post-receive script. It
>> was
>> > generated because a ref change was pushed to the repository containing
>> > the project "open-mpi/ompi".
>> >
>> > The branch, master has been updated
>> >       via  86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c (commit)
>> >      from  cd48fbeec67f1a511a9cf5ce890fef6cc535ef60 (commit)
>> >
>> > Those revisions listed above that are new to this repository have
>> > not appeared on any other notification email; so we list those
>> > revisions in full, below.
>> >
>> > - Log -----------------------------------------------------------------
>> >
>> https://github.com/open-mpi/ompi/commit/86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c
>> >
>> > commit 86f1d5af3ee484f34092ad3f7a645d9a5ccbcb6c
>> > Author: Mike Dubman <mi...@mellanox.com>
>> > Date:   Tue Oct 7 14:07:41 2014 +0300
>> >
>> >    OPAL: drop dead with core on bad flow. rarely happens with
>> helloworld on large scale.
>> >
>> > diff --git a/opal/class/opal_list.h b/opal/class/opal_list.h
>> > index b66438e..bad4cbf 100644
>> > --- a/opal/class/opal_list.h
>> > +++ b/opal/class/opal_list.h
>> > @@ -486,6 +486,7 @@ static inline opal_list_item_t
>> *opal_list_remove_item
>> >     if (!found) {
>> >         fprintf(stderr," Warning :: opal_list_remove_item - the item %p
>> is not on the list %p \n",(void*) item, (void*) list);
>> >         fflush(stderr);
>> > +        abort();
>> >         return (opal_list_item_t *)NULL;
>> >     }
>> >
>> >
>> >
>> > -----------------------------------------------------------------------
>> >
>> > Summary of changes:
>> > opal/class/opal_list.h | 1 +
>> > 1 file changed, 1 insertion(+)
>> >
>> >
>> > hooks/post-receive
>> > --
>> > open-mpi/ompi
>> > _______________________________________________
>> > ompi-commits mailing list
>> > ompi-comm...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/10/16019.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/10/16020.php
>

Reply via email to