Hi dev@, Quick update on the 1.5.0 release, all previous tracked PRs have been merged and CI is back to normal again, please rebase your PR. Again, I would like to encourage downstream projects to test against latest MXNet now to discover bugs and regressions early, really appreciate your help.
We still have 3 new open issues/PRs to track: 1. Gluon NLP BERT training Haibin mentioned 2. https://github.com/apache/incubator-mxnet/pull/15039 3. https://github.com/apache/incubator-mxnet/pull/15097 Thanks! Best Regards Lai On Tue, May 28, 2019 at 9:32 AM Haibin Lin <haibin.lin....@gmail.com> wrote: > Hi dev@, > > I was testing GluonNLP with MXNet master, and found that BERT training > crashes a few hours after I launch the job. I can confirm that MXNet pip > package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to > check what causes the problem. I'll send an update as soon as I find the > root cause, or if I find any workaround. > > Thanks, > Haibin > > On Thu, May 23, 2019 at 2:12 AM Lin Yuan <apefor...@gmail.com> wrote: > > > Hi Lai, > > > > One important PR that is currently blocked by a Flaky TensorRT test: > > > > https://github.com/apache/incubator-mxnet/pull/15041 > > > > I have retriggered it several times. If it fails again, I may need CI > team > > to help disable this test. It has been reported by multiple people: > > https://github.com/apache/incubator-mxnet/issues/14978 > > > > Thanks, > > > > Lin > > > > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric <patric.z...@intel.com> > > wrote: > > > > > Thanks, Lai. > > > > > > With the great helps from the community, all PRs listed in the roadmap > > are > > > done :) > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 > > > > > > Update the status of the below list > > > > > > - [1] PR#14713 is almost done and wait for internal validation results > > > - [2] PR#14893 is merged > > > - [3] PR#15031 is merged > > > - [7] PR#15038 new PR to fix the bug in C++ interface, will be merged > > > soon after the review. > > > > > > Feel free to let me know if anything our team can help :) > > > > > > BR, > > > > > > --Patric > > > > > > > -----Original Message----- > > > > From: Lai Wei [mailto:roywei...@gmail.com] > > > > Sent: Thursday, May 23, 2019 6:05 AM > > > > To: dev@mxnet.incubator.apache.org > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan > > > > > > > > Hi @dev, > > > > > > > > Thanks for working hard for the 1.5 release, since there has been > > several > > > > release blockers (mostly fixed). We are extending the code freeze to > > > Friday > > > > 05/22/2019. Right now we are tracking the following 5 open > > > PRs[1][2][3][4][5] > > > > and 1 issue[6]. Please let us know if you need more time. > > > > > > > > I would like to encourage all downstream projects to test with latest > > > MXNet > > > > to avoid any incompatibility in the coming 1.5.0 release. If you have > > any > > > > issues that may block the release, please let us know. > > > > Thank you very much. > > > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/14713 > > > > [2] https://github.com/apache/incubator-mxnet/pull/14893 > > > > [3] https://github.com/apache/incubator-mxnet/pull/15031 > > > > [4] https://github.com/apache/incubator-mxnet/pull/15039 > > > > [5] https://github.com/apache/incubator-mxnet/pull/15041 > > > > [6] https://github.com/apache/incubator-mxnet/issues/15034 > > > > > > > > > > > > Best Regards > > > > > > > > Lai > > > > > > > > > > > > On Wed, May 15, 2019 at 9:05 PM Junru Shao <junrushao1...@gmail.com> > > > > wrote: > > > > > > > > > Hi folks, > > > > > > > > > > Here I may have a release blocker for 1.5.0 about implementation of > > > > > dynamic shape mechanism, which somehow conflicts with Gluon's > > > > deferred > > > > > initialization [1]. > > > > > > > > > > [1] https://github.com/dmlc/gluon-nlp/issues/706 > > > > > > > > > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian < > > > > > anirudh2...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi Lai, > > > > > > > > > > > > From the discussion I had with Nvidia offline they are targeting > on > > > > > pushing > > > > > > the required changes today. > > > > > > Since this is important feature for the release, if this gets > > > > > > delayed and cannot be merged by 05/17/2019, the code freeze date > > > > > > may need to be changed. > > > > > > > > > > > > Anirudh > > > > > > > > > > > > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A <tao.a...@intel.com> > > > wrote: > > > > > > > > > > > > > Hi dev, > > > > > > > > > > > > > > We see there are several github issues [1][2][3][4] about mxnet > > > > > > > windows build experience. The team is working intensively > > > > > > > [5][6][7] on that to > > > > > > fix > > > > > > > some problems of MKL-DNN build on windows. We hope these fixes > > > > can > > > > > catch > > > > > > > the code freeze and finally enter the 1.5.0 release. > > > > > > > > > > > > > > The PR against mshadow (#374) was already merged and MXNet PR > > > > > > > #14877 is under review - great thanks to CI team for helping on > > > > > > > the MKL > > > > > > installation > > > > > > > request. PR #14952 is document change according to build logic > > > > > > > changes > > > > > in > > > > > > > PR #14877. So I think these two PRs should be merged > > > simultaneously. > > > > > > > Currently #14877 is experiencing a CI response problem. > > > > > > > > > > > > > > Please take your time to have a look at these two PRs. Your > > > > > > > comments > > > > > and > > > > > > > suggestions are highly appreciated. > > > > > > > > > > > > > > Thanks, > > > > > > > -tao > > > > > > > > > > > > > > [1] https://github.com/apache/incubator-mxnet/issues/14670 > > > > > > > [2] https://github.com/apache/incubator-mxnet/issues/14335 > > > > > > > [3] https://github.com/apache/incubator-mxnet/issues/14203 > > > > > > > [4] https://github.com/apache/incubator-mxnet/issues/14085 > > > > > > > [5] https://github.com/apache/incubator-mxnet/pull/14877 > > > > > > > [6] https://github.com/dmlc/mshadow/pull/374 > > > > > > > [7] https://github.com/apache/incubator-mxnet/pull/14952 > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Lai Wei [mailto:roywei...@gmail.com] > > > > > > > Sent: Wednesday, May 15, 2019 2:57 PM > > > > > > > To: dev@mxnet.incubator.apache.org > > > > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan > > > > > > > > > > > > > > Hi Anirudh, > > > > > > > > > > > > > > I see there was an offline disucssion < > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator- > > > > mxnet/pull/14173#pullrequestreview > > > > > -235846341 > > > > > > > > > > > > > > > and I have updated the AMP feature and your project on the > > release > > > > > > tracker > > > > > > > < > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a > > > > > nd+Status > > > > > > > > > > > > > > > , > > > > > > > Please let me know if you have any updates. > > > > > > > > > > > > > > Hi @dev, > > > > > > > This is a gentle reminder that the code freeze for 1.5.0 > release > > > > > > > is on 05/17/2019, please let us know if you have any WIP pull > > > > > > > requests aiming > > > > > > for > > > > > > > 1.5.0 that needs attention. > > > > > > > Please understand we already have around 650 commits in master > > > > > > > that > > > > > need > > > > > > > to be released in time. We understand TensorRT test in CI is > > > > > > > failing > > > > > and > > > > > > > are trying to fix it. Meanwhile please update the tracker if > > there > > > > > > > is > > > > > any > > > > > > > change: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a > > > > > nd+Status > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > Lai > > > > > > > > > > > > > > > > > > > > > On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian < > > > > > > anirudh2...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Sheng, > > > > > > > > > > > > > > > > I had a discussion with nvidia folks offline today (@ptrendx > > et. > > > > > al.). > > > > > > > > I strongly feel that the AMP feature should be included as > part > > > > > > > > of > > > > > the > > > > > > > > release: > https://github.com/apache/incubator-mxnet/pull/14173 > > . > > > > > > > > The PR is aimed for completion for next week but reviews and > > RFC > > > > > > > > discussions may take some time. I would request to extend the > > > > > > > > release code freeze by 2 weeks. > > > > > > > > Also, I would like to include > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32 > > > > > > > > +to+Mixed+Precision+Models > > > > > > > > which > > > > > > > > depends on the AMP PR. > > > > > > > > I am also aiming for adding a PR by this week end or early > next > > > > > > > > week, but reviews will take longer than May 17th. > > > > > > > > > > > > > > > > Anirudh > > > > > > > > > > > > > > > > > > > > > > > > On Mon, May 6, 2019 at 11:49 PM Sheng Zha < > szha....@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > While 1.4.1 vote on general@incubator is still on going, > I’d > > > > > > > > > like > > > > > to > > > > > > > > > propose that we start preparing 1.5.0 release. > > > > > > > > > > > > > > > > > > 1.5.0 will include changes that dates back to last year and > > > > > > > > > there has > > > > > > > > been > > > > > > > > > a lot of new features and improvements in it, so it will > > > > > > > > > likely > > > > > time > > > > > > > > > us more time to prepare than 1.4.1. I propose the following > > > > > timeline: > > > > > > > > > - Cut release branch: release branch already cut. Will sync > > > > > > > > > with master branch on 5/15/2019 EOD. > > > > > > > > > - Code freeze: 5/17/2019. No more changes unless the > release > > > > > > > > > branch is in a broken state. > > > > > > > > > - Tag and vote: 5/20/2019 onward. > > > > > > > > > > > > > > > > > > Lai Wei (roywei@) expressed to me offline that he’s > willing > > to > > > > > help > > > > > > > > drive > > > > > > > > > this release as release manager, and I’m happy to help > again > > > > > > > > > as > > > > > > > > committer. > > > > > > > > > > > > > > > > > > If you have features in progress that you’d like to include > > in > > > > > 1.5.0: > > > > > > > > > - Add your feature to the scope: > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a > > > > > > > > nd+Status > > > > > > > > > - Indicate in this thread: > > > > > > > > > - how confident you are about making it happen before the > > > > > > > > > code > > > > > > > freeze. > > > > > > > > > If not confident, provide estimate for a more manageable > code > > > > > freeze > > > > > > > > > date so that people can discuss whether to extend the > > deadline > > > > > > > > > or > > > > > to > > > > > > > > > skip one release for it. > > > > > > > > > - whether your PR requires more attention to make it > happen. > > > > > > > > > > > > > > > > > > Thanks for your attention. Comments and suggestions are > also > > > > > welcome. > > > > > > > > > > > > > > > > > > -sz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >