I did a training of cifar10 in CPU and seems there's some regressions
in the range of 7% increase of training time against 1.4.1:

(py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
(master)+$ time python cifar10.py --epochs 5
real    11m30.388s
user    417m7.766s
sys     16m57.315s

VS 1.4.1:
real    10m41.994s
user    392m40.646s
sys     12m30.601s


On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <[email protected]> wrote:
>
> Hi Anirudh,
>
> Thanks for jumping into this quickly, I followed up on the issue.
>
> I was meant for sockeye developer/maintainers to help setup nightly tests
> and raise issues early.
>
> Thanks!
>
> On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin <[email protected]>
> wrote:
>
> > In GluonNLP we are testing with MXNET nightly build for each PR, and we did
> > find some MXNet related issue caught by the CI.
> > I recommend other toolkits also add integration tests with MXNet nightly.
> > It helps identify issues early.
> >
> > Best,
> > Haibin
> >
> > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <[email protected]> wrote:
> >
> > > Thanks to raise the issue and we will take a look ASAP.
> > >
> > > The downstream cases is not in the MXNet CI so it's hard to catch the
> > > potential bugs or performance degradation for MXNet developers.
> > >
> > > In the future, I suggest adding the major downstream test cases, like
> > from
> > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> > > > -----Original Message-----
> > > > From: Anirudh Subramanian [mailto:[email protected]]
> > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > To: [email protected]
> > > > Cc: [email protected]
> > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > > >
> > > > Hi Lai,
> > > >
> > > > I have opened an issue:
> > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > I came to know about this issue only today and I have not been
> > monitoring
> > > > sockeye.
> > > > I jumped onto this issue to make sure it wasn't caused by the dlpack
> > > changes.
> > > > Also, I don't  think sockeye CI checks against master, it is using
> > 1.4.1.
> > > >
> > > > Anirudh
> > > >
> > > >
> > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <[email protected]> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Could you share which test failed and what’s the crash? How to
> > > > > reproduce it?
> > > > >
> > > > > I was able to install sockeye and run all tests passed. Using python
> > > > > setup.py test
> > > > >
> > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > >
> > > > > It would be great to create an issue with reproducible steps and move
> > > > > the discussion there.
> > > > >
> > > > > Also I see sockeye nightly build[1] has been failing for some time,
> > if
> > > > > it’s due to MXNet change, please raise this early so we can track and
> > > > > solve it in time rather than block the release during vote time.
> > > > >
> > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > >
> > > > >
> > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > <[email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I was able to reproduce a crash with the commit
> > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > >
> > > > > > Anirudh
> > > > > >
> > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hi Przemyslaw,
> > > > > > >
> > > > > > > Is there an issue with more details to track the problem?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > <[email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > -1
> > > > > > > >
> > > > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > > > occuring in
> > > > > > 1.5rc1. I
> > > > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > related) or
> > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > optimization).
> > > > > > > >
> > > > > > > > On 2019/06/20 06:36:22, Lai Wei <[email protected]> wrote:
> > > > > > > > > Dear MXNet community,
> > > > > > > > >
> > > > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > > > version
> > > > > > > > 1.5.0.
> > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close
> > on
> > > > > June
> > > > > > > 22,
> > > > > > > > > 23:59:59.
> > > > > > > > >
> > > > > > > > > 1) Link to release notes:
> > > > > > > > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > > > s
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2) Link to release candidate:
> > > > > > > > >
> > > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > c1
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > >
> > > > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > c1/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > >
> > > > > > > > > +1 = approve
> > > > > > > > > +0 = no opinion
> > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > --
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > >
> >
> --
> Best Regards
>
> Lai

Reply via email to