Hi dev@,

Quick update on the gluonnlp issue. Lai and I worked together to test
gluonnlp and MXNet with different configurations, and found that the use of
GELU operator in fp16 is causing the divergence. It was a very recent
change in gluonnlp, and it can be avoided by reverting the change in
GluonNLP. This doesn't block 1.5 release anymore.

Best,
Haibin

On Thu, May 30, 2019 at 11:33 AM Lai Wei <roywei...@gmail.com> wrote:

> Hi dev@,
>
> Quick update on the 1.5.0 release, all previous tracked PRs have been
> merged and CI is back to normal again, please rebase your PR.
> Again, I would like to encourage downstream projects to test against latest
> MXNet now to discover bugs and regressions early, really appreciate your
> help.
>
> We still have 3 new open issues/PRs to track:
> 1. Gluon NLP BERT training Haibin mentioned
> 2. https://github.com/apache/incubator-mxnet/pull/15039
> 3. https://github.com/apache/incubator-mxnet/pull/15097
>
> Thanks!
>
> Best Regards
>
> Lai
>
>
> On Tue, May 28, 2019 at 9:32 AM Haibin Lin <haibin.lin....@gmail.com>
> wrote:
>
> > Hi dev@,
> >
> > I was testing GluonNLP with MXNet master, and found that BERT training
> > crashes a few hours after I launch the job. I can confirm that MXNet pip
> > package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to
> > check what causes the problem. I'll send an update as soon as I find the
> > root cause, or if I find any workaround.
> >
> > Thanks,
> > Haibin
> >
> > On Thu, May 23, 2019 at 2:12 AM Lin Yuan <apefor...@gmail.com> wrote:
> >
> > > Hi Lai,
> > >
> > > One important PR that is currently blocked by a Flaky TensorRT test:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15041
> > >
> > > I have retriggered it several times. If it fails again, I may need CI
> > team
> > > to help disable this test. It has been reported by multiple people:
> > > https://github.com/apache/incubator-mxnet/issues/14978
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> > > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric <patric.z...@intel.com>
> > > wrote:
> > >
> > > > Thanks, Lai.
> > > >
> > > > With the great helps from the community, all PRs listed in the
> roadmap
> > > are
> > > > done :)
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> > > >
> > > > Update the status of the below list
> > > >
> > > >  - [1] PR#14713 is almost done and wait for internal validation
> results
> > > >  - [2] PR#14893 is merged
> > > >  - [3] PR#15031 is merged
> > > >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be
> merged
> > > > soon after the review.
> > > >
> > > > Feel free to let me know if anything our team can help :)
> > > >
> > > > BR,
> > > >
> > > > --Patric
> > > >
> > > > > -----Original Message-----
> > > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > > Sent: Thursday, May 23, 2019 6:05 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > > >
> > > > > Hi @dev,
> > > > >
> > > > > Thanks for working hard for the 1.5 release, since there has been
> > > several
> > > > > release blockers (mostly fixed). We are extending the code freeze
> to
> > > > Friday
> > > > > 05/22/2019. Right now we are tracking the following 5 open
> > > > PRs[1][2][3][4][5]
> > > > > and 1 issue[6]. Please let us know if you need more time.
> > > > >
> > > > > I would like to encourage all downstream projects to test with
> latest
> > > > MXNet
> > > > > to avoid any incompatibility in the coming 1.5.0 release. If you
> have
> > > any
> > > > > issues that may block the release, please let us know.
> > > > > Thank you very much.
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > > > > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > > > > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > > > > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > > > > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > > > > [6] https://github.com/apache/incubator-mxnet/issues/15034
> > > > >
> > > > >
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > > > >
> > > > > On Wed, May 15, 2019 at 9:05 PM Junru Shao <
> junrushao1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > Here I may have a release blocker for 1.5.0 about implementation
> of
> > > > > > dynamic shape mechanism, which somehow conflicts with Gluon's
> > > > > deferred
> > > > > > initialization [1].
> > > > > >
> > > > > > [1] https://github.com/dmlc/gluon-nlp/issues/706
> > > > > >
> > > > > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > > > > > anirudh2...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Lai,
> > > > > > >
> > > > > > > From the discussion I had with Nvidia offline they are
> targeting
> > on
> > > > > > pushing
> > > > > > > the required changes today.
> > > > > > > Since this is important feature for the release, if this gets
> > > > > > > delayed and cannot  be merged by 05/17/2019, the code freeze
> date
> > > > > > > may need to be changed.
> > > > > > >
> > > > > > > Anirudh
> > > > > > >
> > > > > > > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A <tao.a...@intel.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi dev,
> > > > > > > >
> > > > > > > > We see there are several github issues [1][2][3][4] about
> mxnet
> > > > > > > > windows build experience. The team is working intensively
> > > > > > > > [5][6][7] on that to
> > > > > > > fix
> > > > > > > > some problems of MKL-DNN build on windows. We hope these
> fixes
> > > > > can
> > > > > > catch
> > > > > > > > the code freeze and finally enter the 1.5.0 release.
> > > > > > > >
> > > > > > > > The PR against mshadow (#374) was already merged and MXNet PR
> > > > > > > > #14877 is under review - great thanks to CI team for helping
> on
> > > > > > > > the MKL
> > > > > > > installation
> > > > > > > > request. PR #14952 is document change according to build
> logic
> > > > > > > > changes
> > > > > > in
> > > > > > > > PR #14877. So I think these two PRs should be merged
> > > > simultaneously.
> > > > > > > > Currently #14877 is experiencing a CI response problem.
> > > > > > > >
> > > > > > > > Please take your time to have a look at these two PRs. Your
> > > > > > > > comments
> > > > > > and
> > > > > > > > suggestions are highly appreciated.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -tao
> > > > > > > >
> > > > > > > > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > > > > > > > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > > > > > > > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > > > > > > > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > > > > > > > [5] https://github.com/apache/incubator-mxnet/pull/14877
> > > > > > > > [6] https://github.com/dmlc/mshadow/pull/374
> > > > > > > > [7] https://github.com/apache/incubator-mxnet/pull/14952
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > > > > > Sent: Wednesday, May 15, 2019 2:57 PM
> > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > > > > > >
> > > > > > > > Hi Anirudh,
> > > > > > > >
> > > > > > > > I see there was an offline disucssion <
> > > > > > > >
> > > > > > >
> > > > > > https://github.com/apache/incubator-
> > > > > mxnet/pull/14173#pullrequestreview
> > > > > > -235846341
> > > > > > > > >
> > > > > > > > and I have updated the AMP feature and your project on the
> > > release
> > > > > > > tracker
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > > > > > nd+Status
> > > > > > > > >
> > > > > > > > ,
> > > > > > > > Please let me know if you have any updates.
> > > > > > > >
> > > > > > > > Hi @dev,
> > > > > > > > This is a gentle reminder that  the code freeze for 1.5.0
> > release
> > > > > > > > is on 05/17/2019, please let us know if you have any WIP pull
> > > > > > > > requests aiming
> > > > > > > for
> > > > > > > > 1.5.0 that needs attention.
> > > > > > > > Please understand we already have around 650 commits in
> master
> > > > > > > > that
> > > > > > need
> > > > > > > > to be released in time. We understand TensorRT test in CI is
> > > > > > > > failing
> > > > > > and
> > > > > > > > are trying to fix it. Meanwhile please update the tracker if
> > > there
> > > > > > > > is
> > > > > > any
> > > > > > > > change:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > > > > > nd+Status
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian <
> > > > > > > anirudh2...@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Sheng,
> > > > > > > > >
> > > > > > > > > I had a discussion with nvidia folks offline today
> (@ptrendx
> > > et.
> > > > > > al.).
> > > > > > > > > I strongly feel that the AMP feature should be included as
> > part
> > > > > > > > > of
> > > > > > the
> > > > > > > > > release:
> > https://github.com/apache/incubator-mxnet/pull/14173
> > > .
> > > > > > > > > The PR is aimed for completion for next week but reviews
> and
> > > RFC
> > > > > > > > > discussions may take some time. I would request to extend
> the
> > > > > > > > > release code freeze by 2 weeks.
> > > > > > > > > Also, I would like to include
> > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > > > > > > > +to+Mixed+Precision+Models
> > > > > > > > > which
> > > > > > > > > depends on the AMP PR.
> > > > > > > > > I am also aiming for adding a PR by this week end or early
> > next
> > > > > > > > > week, but reviews will take longer than May 17th.
> > > > > > > > >
> > > > > > > > > Anirudh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, May 6, 2019 at 11:49 PM Sheng Zha <
> > szha....@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > While 1.4.1 vote on general@incubator is still on going,
> > I’d
> > > > > > > > > > like
> > > > > > to
> > > > > > > > > > propose that we start preparing 1.5.0 release.
> > > > > > > > > >
> > > > > > > > > > 1.5.0 will include changes that dates back to last year
> and
> > > > > > > > > > there has
> > > > > > > > > been
> > > > > > > > > > a lot of new features and improvements in it, so it will
> > > > > > > > > > likely
> > > > > > time
> > > > > > > > > > us more time to prepare than 1.4.1. I propose the
> following
> > > > > > timeline:
> > > > > > > > > > - Cut release branch: release branch already cut. Will
> sync
> > > > > > > > > > with master branch on 5/15/2019 EOD.
> > > > > > > > > > - Code freeze: 5/17/2019. No more changes unless the
> > release
> > > > > > > > > > branch is in a broken state.
> > > > > > > > > > - Tag and vote: 5/20/2019 onward.
> > > > > > > > > >
> > > > > > > > > > Lai Wei (roywei@) expressed to me offline that he’s
> > willing
> > > to
> > > > > > help
> > > > > > > > > drive
> > > > > > > > > > this release as release manager, and I’m happy to help
> > again
> > > > > > > > > > as
> > > > > > > > > committer.
> > > > > > > > > >
> > > > > > > > > > If you have features in progress that you’d like to
> include
> > > in
> > > > > > 1.5.0:
> > > > > > > > > > - Add your feature to the scope:
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > > > > > > > > nd+Status
> > > > > > > > > > - Indicate in this thread:
> > > > > > > > > >   - how confident you are about making it happen before
> the
> > > > > > > > > > code
> > > > > > > > freeze.
> > > > > > > > > > If not confident, provide estimate for a more manageable
> > code
> > > > > > freeze
> > > > > > > > > > date so that people can discuss whether to extend the
> > > deadline
> > > > > > > > > > or
> > > > > > to
> > > > > > > > > > skip one release for it.
> > > > > > > > > > - whether your PR requires more attention to make it
> > happen.
> > > > > > > > > >
> > > > > > > > > > Thanks for your attention. Comments and suggestions are
> > also
> > > > > > welcome.
> > > > > > > > > >
> > > > > > > > > > -sz
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Reply via email to