I agree with Anirudh that the focus of the discussion should be limited to
the release branch, not the master branch. Anything that breaks on master
but works on release branch should not block the release itself.


Best,

Haibin

On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy <pedro.larroy.li...@gmail.com>
wrote:

> I see your point.
>
> I checked the failures on the v1.2.0 branch and I don't see segfaults, just
> minor failures due to flaky tests.
>
> I will trigger it repeatedly a few times until Sunday to have a and change
> my vote accordingly.
>
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/v1.2.0/17/pipeline
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/v1.2.0/15/pipeline/
>
>
> Pedro.
>
> On Fri, May 4, 2018 at 7:16 PM, Anirudh <anirudh2...@gmail.com> wrote:
>
> > Hi Pedro,
> >
> > Thank you for the suggestions. I will try to reproduce this without fixed
> > seeds and also run it for a longer time duration.
> > Having said that, running unit tests over and over for a couple of days
> > will likely cause
> > problems  because there around 42 open issues for flaky tests:
> > https://github.com/apache/incubator-mxnet/issues?q=is%
> > 3Aopen+is%3Aissue+label%3AFlaky
> > Also, the release branch has diverged from master around 3 weeks back and
> > it doesn't have many of the changes merged to the master.
> > So, my question essentially is, what will be your benchmark to accept the
> > release ?
> > Is it that we run the test which you provided on 1.2 without fixed seeds
> > and for a longer duration without failures ?
> > Or is it that all unit tests should pass over a period of 2 days without
> > issues. This may require fixing all of the flaky tests which would delay
> > the release by considerable amount of time.
> > Or is it something else ?
> >
> > Anirudh
> >
> >
> > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Could you remove the fixed seeds and run it for a couple of hours with
> an
> > > additional loop?  Also I would suggest running the unit tests over and
> > over
> > > for a couple of days if possible.
> > >
> > >
> > > Pedro.
> > >
> > > On Thu, May 3, 2018 at 8:33 PM, Anirudh <anirudh2...@gmail.com> wrote:
> > >
> > > > Hi Pedro and Naveen,
> > > >
> > > > I am unable to reproduce this issue with MKLDNN on the master but not
> > on
> > > > the 1.2.RC2 branch.
> > > >
> > > > Did the following on 1.2.RC2 branch:
> > > >
> > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0
> > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
> > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> > > > export MXNET_TEST_SEED=11
> > > > export MXNET_MODULE_SEED=812478194
> > > > export MXNET_TEST_COUNT=10000
> > > > nosetests-2.7 -v tests/python/unittest/test_
> > > module.py:test_forward_reshape
> > > >
> > > > Was able to do the 10k runs successfully.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <anirudh2...@gmail.com>
> wrote:
> > > >
> > > > > Hi Pedro and Naveen,
> > > > >
> > > > > Is this issue reproducible when MXNet is built with USE_MKLDNN=0?
> > > > > Also, there are a bunch of MKLDNN fixes that didn't go into the
> > release
> > > > > branch. Is this issue reproducible on the release branch ?
> > > > > In my opinion, since we have marked MKLDNN as experimental feature
> > for
> > > > the
> > > > > release, if it is confirmed to be a MKLDNN issue
> > > > > we don't need to block the release on it.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <mnnav...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Thanks for raising this issue Pedro.
> > > > >>
> > > > >> -1(binding)
> > > > >>
> > > > >> We were in a similar state for a while a year ago, a lot of effort
> > > went
> > > > to
> > > > >> stabilize the tests and the CI. I have seen the PR builds are
> > > > >> non-deterministic and you have to retry over and over (wasting
> > > resources
> > > > >> and time) and hope you get lucky.
> > > > >>
> > > > >> Look at the dashboard for master build
> > > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> > mxnet/job/master/
> > > > >>
> > > > >> -Naveen
> > > > >>
> > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy <
> > > > >> pedro.larroy.li...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > -1  nondeterminisitc failures on CI master:
> > > > >> > https://issues.apache.org/jira/browse/MXNET-396
> > > > >> >
> > > > >> > Was able to reproduce once in a fresh p3 instance with DLAMI
> > can't
> > > > >> > reproduce consistently.
> > > > >> >
> > > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh <anirudh2...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > Hi all,
> > > > >> > >
> > > > >> > > As part of RC2 release, we have addressed bugs and some
> concerns
> > > > that
> > > > >> > were
> > > > >> > > raised.
> > > > >> > >
> > > > >> > > I would like to propose a vote to release Apache MXNet
> > > (incubating)
> > > > >> > version
> > > > >> > > 1.2.0.RC2. Voting will start now (Wednesday, May 2nd) and end
> at
> > > > >> 12:50 PM
> > > > >> > > PDT, Sunday, May 6th.
> > > > >> > >
> > > > >> > > Link to release notes:
> > > > >> > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > >> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
> > > > >> > >
> > > > >> > > Link to release candidate 1.2.0.rc2:
> > > > >> > > https://github.com/apache/incubator-mxnet/releases/tag/
> > 1.2.0.rc2
> > > > >> > >
> > > > >> > > Voting results for 1.2.0.rc2:
> > > > >> > > https://lists.apache.org/thread.html/
> > > ebe561c609a8e32351dfe4aafc8876
> > > > >> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E
> > > > >> > >
> > > > >> > > View this page, click on "Build from Source", and use the
> source
> > > > code
> > > > >> > > obtained from 1.2.0.rc2 tag:
> > > > >> > > https://mxnet.incubator.apache.org/install/index.html
> > > > >> > >
> > > > >> > > (Note: The README.md points to the 1.2.0 tag and does not work
> > at
> > > > the
> > > > >> > > moment.)
> > > > >> > >
> > > > >> > > Please remember to test first before voting accordingly:
> > > > >> > >
> > > > >> > > +1 = approve
> > > > >> > > +0 = no opinion
> > > > >> > > -1 = disapprove (provide reason)
> > > > >> > >
> > > > >> > > Anirudh
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to