Hi Anirudh, Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to have a release version of MKL-DNN there, right? Or do you mean all the development in the future should base on the release version of MKL-DNN? For the former one, I think 0.17 release of MKL-DNN is a good choice. But it will not have fix for the LSTM regression mentioned in previous email.
I'm talking about the versioning mechanism with MKL-DNN maintainers and will be back to you if I get any response. But from the releasing history of MKL-DNN, I cannot find any evidence about patch release. -tao -----Original Message----- From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] Sent: Tuesday, November 27, 2018 6:16 AM To: dev@mxnet.incubator.apache.org Subject: Re: Include MKLDNN into default mxnet pip package Hi Tao, I agree with Steffen that we can start with a stable release for MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide info on what versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there are some regressions found like the LSTM regression, would it be possible to do a patch release for it or maintain a release branch for it ? Anirudh On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a...@intel.com> wrote: > Hi Steffen, > > I think all the commits on MKL-DNN master branch are well tested for > MKL-DNN development team. If we really want to have a release commit > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release. > > Thank you, > Tao > > Sent from my iPhone > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel > > <steffenroc...@gmail.com> > wrote: > > > > +1 to make MKL-DNN default. > > I'm tracking https://github.com/apache/incubator-mxnet/issues/13369 > > as open issue to be addressed for 1.4.0 I do agree that we should > > move to a model to include released > dependencies > > instead of just taking bleeding edge snapshots. > > However, speed of development is important as well. > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN > development > > team provide us with a well tested tag/commit id to include in 1.4.0 > > release? > > Steffen > > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com> wrote: > >> > >> Thanks for the information, Kellen and Naveen. > >> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning > >> and release tags. My concern is that as MKL-DNN is still under > >> intensive development, if it has a new feature or bug fix on its > >> master branch, > do we > >> really want to wait for next release to get it supported in MXNet? > >> > >> Take the LSTM regression as an example, probably MKL-DNN will give > >> a fix or improvement on its master branch soon, do we need to wait > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow > >> is also using normal commit id, not release, as the dependency for MKL-DNN. > >> > >> Regarding the LSTM regression, we are using internal JIRA tickets > >> rather than github issues to track the defects of MKL-DNN. But I > >> agree with > you, > >> we need update the progress of it in Alex's issue. > >> > >> Thanks, > >> -tao > >> > >> -----Original Message----- > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com] > >> Sent: Thursday, November 22, 2018 10:55 AM > >> To: dev@mxnet.incubator.apache.org > >> Subject: Re: Include MKLDNN into default mxnet pip package > >> > >> Agree with your point about other repos also not being based on > versioning > >> Tao. I would point out that I've given some that I've worked with > similar > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68 > >> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com> > wrote: > >>> > >>> Tao, > >>> > >>> You are right there are many submodules in 3rd party. We have to > >>> start somewhere and I believe this one is a good candidate to start with. > >>> This is not to cater to release of MXNet or to tie them with the > >>> releases of the submodules but instead to pick only stable > >>> releases and not to pick up bleeding edge commits from the tip of > >>> the master, this gives us confidence in the submodule that MXNet > >>> users are depending on that especially if we make MKLDNN the default. > >>> > >>> Good to know it is known already as a regression.Alex has created > >>> this issue https://github.com/apache/incubator-mxnet/issues/13369, > >>> please add details and link the corresponding issue in MKLDNN(I > >>> couldn't > find). > >>> > >>> -Naveen > >>> > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote: > >>>> > >>>> Here are my answers for the questions from Kellen and Naveen > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making > >>>> MKL-DNN default here. > >>>> > >>>> @Kellen, > >>>> > >>>> FYI, here is a list for those platforms which are officially > >>>> supported by MKL-DNN. > >>>> https://github.com/intel/mkl-dnn#system-requirements > >>>> > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So > >>>> they are supposed to generate code according to the platform > >>>> during runtime. For non-JIT code in MKL-DNN, same as other code > >>>> in MXNet, it will generate instructions according to the > >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when build > >>>> MKL-DNN to avoid optimization for compiling machine. That's > >>>> exactly what we are doing > >> for MKL-DNN build in MXNet. > >>> Even > >>>> without MKL-DNN, I noticed there were issues about illegal > >>>> instructions > >>> of > >>>> MXNet when users import the pip package on a lower end machine > >>>> which probably only supports SSE. > >>>> > >>>> @Naveen, > >>>> > >>>> The LSTM issue has already been identified as a regression from > >>>> the > >>> recent > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new > >>>> update of MKL-DNN. > >>>> > >>>> MXNet has many submodule dependencies under the 3rd party folder. > >>>> Seems > >>> we > >>>> don't require release versions for most of these dependencies. > >>>> The > >>> release > >>>> period of MKL-DNN and MXNet are not matched very well. I think it > >>>> would > >>> be > >>>> a risk for MXNet release if it hardly depends on the release of a > >>>> submodule, no need to say depends on the releases of all submodules. > >>>> > >>>> -tao > >>>> > >>>> -----Original Message----- > >>>> From: Naveen Swamy [mailto:mnnav...@gmail.com] > >>>> Sent: Thursday, November 22, 2018 9:08 AM > >>>> To: dev@mxnet.incubator.apache.org > >>>> Cc: d...@mxnet.apache.org > >>>> Subject: Re: Include MKLDNN into default mxnet pip package > >>>> > >>>> Hi Alex, > >>>> > >>>> Thanks for promptly running the numbers on AMD and reporting here. > >>>> > >>>> Can you please update the AMD numbers here for posterity > >>>> > >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel > >>> +MKL > >>> -DNN+-+Performance+Benchmarking > >>>> ? > >>>> > >>>> are there any outstanding issues when MKLDNN is enabled? from my > >>>> offline conversation I am briefly aware performance issues with > >>>> LSTM, is there an GitHub issue for it? > >>>> > >>>> MKLDNN is a submodule dependency, are we pulling the latest > >>>> commit or releases ? If not we should move to releases before we > >>>> make it a > >>> default. > >>>> Ideally we should use platform specific distributions (-dev > >>>> packages) at least we should rely on well tested releases. > >>>> > >>>> > >>>> Thanks, Naveen > >>>> > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander > >>> <alex...@amazon.com.invalid > >>>>> > >>>> wrote: > >>>> > >>>>> AMD benchmarks have been published. We are seeing a x15.8 > >>>>> speedup with > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With > >>>>> a smaller network (Mobilenet - batch size 32) the speedup is > >>>>> more significant at x38.7. Let's have a vote to see if the PR to > >>>>> have MKLDNN enabled by default > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be > >>>>> merged before 1.4.0 release. > >>>>> > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy" > >>>>> <pedro.larroy.li...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> I did pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen > >>>>> 1950X and unit > >>>>> tests are passing. > >>>>> > >>>>> Is this build using AVX512? in /proc/cpuinfo I see only "avx" > >>> flag. > >>>>> There's no "avx2" like on recent intel cpus. > >>>>> > >>>>> Pedro. > >>>>> > >>>>> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko > >>>>> <lupe...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Awesome collaborative effort across many contributors and > >>>> companies! > >>>>>> > >>>>>> The boost is impressive and for MXNet users to get this > >>>>> boost "out of the > >>>>>> box" is a great benefit and makes MXNet an even better choice. > >>>>>> > >>>>>> Alex - can you clarify whether there are any down sides with > >>>>> regards to > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it > >>>>> gracefully fallback? > >>>>>> > >>>>>> Hagay > >>>>>> > >>>>>> > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández > >>>>> <wik...@apache.org> > >>>>> wrote: > >>>>>> > >>>>>>> If there is no downside on platforms not supporting AVX512 > >>>>> instructions, > >>>>>>> then +1 > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com> > >> wrote: > >>>>>>> > >>>>>>>> Hey all, > >>>>>>>> We have been working hard these past few months to > >>>>> integrate > >>>> and > >>>>>>> stabilize > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet > >>>>> and have made > >>>>>>>> incredible progress. On CPUs with AVX512 instructions > >>>>> (such as > >>>>> c5.18x) > >>>>>> we > >>>>>>>> have seen performance increase up to 12x and on other > >>>>> platforms (Macs, > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks > >>>>> can be found > >>>>>>> here > >>>>>>>> ( > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9 > >>> 5650 > >>> 764 > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591 > >> ). > >>>>>>>> > >>>>>>>> Currently, using this accelerator requires the developer > >>>>> to either pip > >>>>>>>> install the mxnet-mkl version of mxnet or to build it > >>>>> themselves from > >>>>>>>> source. Given that we should try to provide the best > >>>>> performance "out > >>>>>> of > >>>>>>>> the box” with mxnet we should include this in the > >>>>> default > >>>> build. > >>>>> The > >>>>>>> mkldnn > >>>>>>>> library is included with in the pip package build so it > >>>>> does > >>>> not > >>>>>> require > >>>>>>> an > >>>>>>>> external dependency. > >>>>>>>> > >>>>>>>> There were concerns that MKLDNN could cause regressions > >>>>> on certain > >>>>>>>> platforms (as it did with the tensorflow version a while > >>>>> back); but we > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows > >>>>> users to turn of > >>>>>> this > >>>>>>>> feature during runtime. Please bring up any other > >>>>> concerns you may have > >>>>>>> and > >>>>>>>> your thoughts on including this accelerator in the > >>>>> default > >>>> build. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Alex > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >> >