I found 2 bugs related to gluon Trainer with distributed KVStore. Basically if someone uses Gluon for distributed training with a learning rate schedule (e.g. train ResNet50 for image classification), it won't work.
https://github.com/apache/incubator-mxnet/issues/12713 I have the fix for the first bug locally, but I don't have the fix for the second one. Best, Haibin On Mon, Oct 1, 2018 at 10:14 AM Afrooze, Sina <sina....@gmail.com> wrote: > This post suggests there is a regression from 1.1.0 to 1.2.1 related to > MKLDNN integration: > https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882 > > The error is related to MKLDNN layout not being converted back to MXNet > layout in some operator: " !IsMKLDNNData() We can’t generate TBlob for > MKLDNN data. Please use Reorder2Default() to generate a new NDArray first" > > Sina > > > > > On 9/30/18, 6:55 PM, "Steffen Rochel" <steffenroc...@gmail.com> wrote: > > Thanks Patrick. > Updated roadmap and next release content. > > Patrick - suggest to send a reminder to review the design doc and > collect > feedback. > Are there still known issues or gaps before we declare MKL-DNN > integration > as GA? > > Regards, > Steffen > > On Sat, Sep 29, 2018 at 1:31 AM Zhao, Patric <patric.z...@intel.com> > wrote: > > > Thanks, Steffen. > > > > Regarding the next release note, two items from our side: > > > > 1. (-remove) MKL-DNN integration is done. I think we can remove this > item. > > 2. (+add) MKL-DNN based graph optimization and quantization by > subgraph > > Design doc: > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN > > Lead Contributor: Patric Zhao, > https://github.com/pengzhao-intel/ > > > > Regarding the Roadmap > > (+add) Q1 2019: MKL-DNN RNN API supports > > > > BR, > > > > Thanks, > > > > --Patric > > > > > > > -----Original Message----- > > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com] > > > Sent: Saturday, September 29, 2018 11:31 AM > > > To: dev@mxnet.incubator.apache.org > > > Subject: Re: [Discuss] Next MXNet release > > > > > > Sorry I meant to say next 'Regarding the *minor* release'. > > > > > > On Sat, Sep 29, 2018 at 5:27 AM kellen sunderland < > > > kellen.sunderl...@gmail.com> wrote: > > > > > > > Thanks for transparently setting a rough timeline Steffen. I > think > > > > this will go a long way in helping the community plan their > work, even > > > > if the details change somewhat on the road to the release. > > > > > > > > Regarding the major release: I would propose we unify TensorRT > with > > > > the subgraph operator work. > > > > > > > > Regarding the patch release: There were a few minor stack/buffer > > > > overflows exposed by ASAN that have been addressed. It's > probably a > > > > good idea to include them in a patch release, as they at best > result > > > > in non-deterministic behaviour. > > > > > > > > -Kellen > > > > > > > > > > > > On Sat, Sep 29, 2018 at 1:39 AM Steffen Rochel > > > > <steffenroc...@gmail.com> > > > > wrote: > > > > > > > >> I updated > > > >> > > > >> > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f > > > >> or+next+MXNet+Release > > > >> , > > > >> removed the completed items from 1.3 release and would like to > kick > > > >> off discussion about the next release. Please suggest what you > would > > > >> like to see included in the next release together with link to > design > > > >> proposal (appropriately for the size and complexity of the > proposal) > > > >> or suggest changes. > > > >> I suggest to target the next release for December 2018 to frame > the > > > >> discussion. > > > >> Lets include review of > > > >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap > - > > > >> time to update and discuss changes. > > > >> > > > >> From the 1.3 release we had discussion regarding > > > >> https://github.com/apache/incubator-mxnet/issues/11849 and > resolution > > > >> in > > > >> https://github.com/apache/incubator-mxnet/pull/12412 . > > > >> Are you aware of critical issues and feedback from user which we > > > >> should consider for a potential 1.3.1 patch release. Should we > > > >> include PR 12412 in a potential patch release? > > > >> > > > >> Regards, > > > >> Steffen > > > >> > > > > > > > > > >