+1 (binding) I ran unittests successfully on 3 different GPU architectures on CUDA 10.2 :
Tesla P100-SXM2 (Pascal, arch = 60). Tesla V100-SXM2 (Volta, arch = 70) Tesla T4 (Turing, arch = 75) Compilation was via ./tests/jenkins/run_test_ubuntu.sh modified with the additional lines: echo "DEV=0" >> config.mk echo "USE_MKLDNN=0" >> config.mk echo "USE_LAPACK_PATH=/usr/lib/x86_64-linux-gnu" >> config.mk echo "USE_BLAS=openblas" >> config.mk Tests run were: nosetests --verbose tests/python/unittest || exit 1 nosetests --verbose tests/python/gpu/ || exit 1 nosetests --verbose tests/python/train || exit 1 All tests on all 3 configurations passed. -Dick On 2020/02/05 19:24:35, "Lausen, Leonard" <lau...@amazon.com.INVALID> wrote: > Hi Markus, > > you point out a critical flaw of the current MXNet website. We don't have any > versioning and the website is always built from master branch. > > Thus while recent improvements to the build system are backwards compatible > (ie. > old instructions continue to work), there is no way to find the old > instructions > to build "old" releases. > > https://github.com/apache/incubator-mxnet/issues/17497 tracks the issue. > > Including the package build instructions with the source release makes sense. > To make sure they don't get out of date, including the html pages built from > the > source release is another option. > > Best regards > Leonard > > On Wed, 2020-02-05 at 11:06 -0800, Markus Weimer wrote: > > Hi, > > > > I was trying to follow the build instructions[0] on Ubuntu 18.04. > > However, I a stumped at step 2: > > > > `cp config/config.cmake config.cmake` > > > > The file `cmake.conf` does not seem to exist in the tarball on the > > dist sit. `find . -name "cmake.conf" -print` finds nothing. In fact, > > the `config` folder doesn't seem to exist in the tarball either. > > However, the file and folder do exist on GitHub[1]. Are the build > > instructions for a release different from the build from the > > repository? > > > > On a related note: It might make sense to package build instructions > > with the source release. Websites get updated to reflect current use, > > and it might be difficult for future users of this version of mxnet to > > piece together the build instructions. > > > > Thanks, > > > > Markus > > > > > > [0]: https://mxnet.apache.org/get_started/ubuntu_setup > > [1]: https://github.com/apache/incubator-mxnet/tree/master/config > > > > On Tue, Feb 4, 2020 at 3:05 PM Lausen, Leonard > > <lau...@amazon.com.invalid> wrote: > > > Using latest upstream jemalloc > > > https://github.com/leezu/mxnet/commit/fd4c78a635087f6164344da53a55ba2b67da2fd2 > > > fixes the issue. > > > > > > However, there were concerns that this commit relies on unreleased > > > development > > > features of jemalloc (jemalloc cmake build system support) and we'll not > > > merge > > > this commit until upstream releases cmake build system support in a > > > release. > > > > > > In the meantime anyone is welcome to work on an equivalent patch based on > > > the > > > custom build system in latest stable jemalloc. > > > > > > On Tue, 2020-02-04 at 22:46 +0000, Lausen, Leonard wrote: > > > > Bisect identifies > > > > https://github.com/apache/incubator-mxnet/commit/425319cb59904573bd3fe1b6fe0a7381eceb9bbd > > > > > > > > Thus this is an issue with jemalloc + llvm libopemnp. > > > > > > > > The correct reproducer for latest master branch is > > > > > > > > > > > > git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet > > > > cd mxnet > > > > git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround > > > > https://github.com/apache/incubator-mxnet/issues/17514 > > > > mkdir build; cd build; > > > > cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja > > > > -DUSE_CUDA=OFF -DUSE_JEMALLOC=ON .. > > > > ninja > > > > ./cpp-package/example/test_regress_label # run a 2-3 times to > > > > reproduce > > > > > > > > Let's move the discussion to about fixing the jemalloc, openmp > > > > incompatibility > > > > to https://github.com/apache/incubator-mxnet/issues/17043 > > > > > > > > > > > > > > > > @Chris, could you look into this issue as it only happens with LLVM > > > > OpenMP? > > > > > > > > > > > > > > > > @Przemek: For 1.6.0 releas notes I suggest include recommendation to set > > > > USE_JEMALLOC=OFF when compiling from source. > > > > > > > > This note should probably be added in any case, as building with > > > > USE_JEMALLOC=ON > > > > is broken on Ubuntu Ubuntu 18.10 and higher, as well as Debian Stable. > > > > > > > > Given these release notes, +1 for the release. > > > > > > > > > > > > Best regards > > > > Leonard > > > > > > > > On Tue, 2020-02-04 at 22:26 +0000, Lausen, Leonard wrote: > > > > > Actually below reproducer is wrong. The issue was apparently fixed on > > > > > master > > > > > recently. I'm running an automated bisect and will report the result > > > > > later. > > > > > > > > > > On Tue, 2020-02-04 at 21:44 +0000, Lausen, Leonard wrote: > > > > > > Hi Chris, > > > > > > > > > > > > you previously found and fixed a OMP race condition during fork at > > > > > > https://github.com/apache/incubator-mxnet/pull/17039 > > > > > > > > > > > > This time no forks are involved. Could you run the following > > > > > > reproducer on > > > > > > master branch: > > > > > > > > > > > > git clone --recursive https://github.com/apache/incubator-mxnet/ > > > > > > mxnet > > > > > > cd mxnet > > > > > > git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround > > > > > > https://github.com/apache/incubator-mxnet/issues/17514 > > > > > > mkdir build; cd build; > > > > > > cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo > > > > > > -GNinja > > > > > > -DUSE_CUDA=OFF .. > > > > > > ninja > > > > > > ./cpp-package/example/test_regress_label # run a 2-3 times to > > > > > > reproduce > > > > > > > > > > > > > > > > > > As you are OpenMP expert, you may be able to identify the root cause > > > > > > withe > > > > > > relative ease. > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Leonard > > > > > > > > > > > > On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote: > > > > > > > When "fixing", please "fix" through actual root-cause analysis > > > > > > > (use > > > > > > > gdb, > > > > > > > for instance) and not simply by guesswork and cutting out things > > > > > > > which > > > > > > > probably aren't actually at fault (blaming an OMP library that's > > > > > > > in > > > > > > > worldwide distribution int he billions should be treated with > > > > > > > great > > > > > > > skepticism). > > > > > > > > > > > > > > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan <apefor...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Pedro, > > > > > > > > > > > > > > > > While I agree with you we need to fix this usability issue, I > > > > > > > > don't > > > > > > > > think > > > > > > > > this is a release blocker as Przemek mentioned above. Could we > > > > > > > > fix > > > > > > > > this > > > > > > > > in > > > > > > > > the next minor release? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Lin > > > > > > > > > > > > > > > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy < > > > > > > > > pedro.larroy.li...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Right. Would it be possible to have the CMake build also use > > > > > > > > > libgomp > > > > > > > > > for > > > > > > > > > consistency with the releases until these issues are resolved? > > > > > > > > > This can affect anyone compiling the distribution with CMake > > > > > > > > > and > > > > > > > > > also > > > > > > > > > happens randomly in CI, worsening the contributor experience > > > > > > > > > due > > > > > > > > > to > > > > > > > > > CI > > > > > > > > > failures. > > > > > > > > > > > > > > > > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak < > > > > > > > > > ptre...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Pedro, > > > > > > > > > > > > > > > > > > > > From the issue that you linked it seems that you are using > > > > > > > > > > the > > > > > > > > > > LLVM > > > > > > > > > > OpenMP, whereas I believe the actual release uses libgomp > > > > > > > > > > (at > > > > > > > > > > least > > > > > > > > > that's > > > > > > > > > > what seems to be the conclusion from this issue: > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/16891)? > > > > > > > > > > > > > > > > > > > > Przemek > > > > > > > > > > > > > > > > > > > > On 2020/02/04 03:42:30, Pedro Larroy < > > > > > > > > > > pedro.larroy.li...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > -1 > > > > > > > > > > > > > > > > > > > > > > Unit tests passed in CPU build. > > > > > > > > > > > > > > > > > > > > > > I observe crashes related to openmp using cpp unit tests: > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/17043 > > > > > > > > > > > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat < > > > > > > > > > > > chai.ba...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > +1 > > > > > > > > > > > > Successfully built MXNet 1.6.0rc2 on Linux > > > > > > > > > > > > Tested for OpPerf utility > > > > > > > > > > > > For CPU - > > > > > > > > > > > > > > > > > > > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c > > > > > > > > > > > > Works well! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan > > > > > > > > > > > > <apefor...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > Tested Horovod with mnist example. My compiler flags > > > > > > > > > > > > > are > > > > > > > > > > > > > below: > > > > > > > > > > > > > > > > > > > > > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ > > > > > > > > > > > > > CPU_SSE, > > > > > > > > > > > > > ✔ > > > > > > > > > > CPU_SSE2, > > > > > > > > > > > > ✔ > > > > > > > > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ > > > > > > > > > > > > > CPU_AVX, > > > > > > > > > > > > > ✖ > > > > > > > > > > > > CPU_AVX2, ✔ > > > > > > > > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ > > > > > > > > > > > > > BLAS_ATLAS, > > > > > > > > > > > > > ✖ > > > > > > > > > > > > BLAS_MKL, ✖ > > > > > > > > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ > > > > > > > > > > > > > PROFILER, > > > > > > > > > > > > > ✔ > > > > > > > > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ > > > > > > > > > > > > > SIGNAL_HANDLER, > > > > > > > > > > > > > ✖ > > > > > > > > > > DEBUG, ✖ > > > > > > > > > > > > > TVM_OP] > > > > > > > > > > > > > > > > > > > > > > > > > > Lin > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv > > > > > > > > > > > > > <ta...@apache.org> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > I tested below items: > > > > > > > > > > > > > > 1. download artifacts from Apache dist repo; > > > > > > > > > > > > > > 2. the signature looks good; > > > > > > > > > > > > > > 3. build from source code with MKL-DNN and MKL on > > > > > > > > > > > > > > centos; > > > > > > > > > > > > > > 4. run fp32 and int8 inference of ResNet50 under > > > > > > > > > > > > /example/quantization/. > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > -tao > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv < > > > > > > > > > > > > > > ta...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > I see. I was looking at this page: > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2 > > > > > > > > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak < > > > > > > > > > > ptre...@apache.org > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Tao, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could you tell me where did you look for it and > > > > > > > > > > > > > > > > did > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > find > > > > > > > > > > it? I > > > > > > > > > > > > > just > > > > > > > > > > > > > > > > checked and both > > > > > > > > > > > > > > > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/ > > > > > > > > > > > > and > > > > > > > > > > > > > > > > draft of the release on GitHub have them. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > Przemek > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2020/02/01 14:23:11, Tao Lv > > > > > > > > > > > > > > > > <ta...@apache.org> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > It seems the src tar and signature are missing > > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > the > > > > > > > > tag. > > > > > > > > > > > > > > > > > On Fri, Jan 31, 2020 at 11:09 AM Przemysław > > > > > > > > > > > > > > > > > Trędak < > > > > > > > > > > > > > > ptre...@apache.org> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear MXNet community, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is the vote to release Apache MXNet > > > > > > > > > > > > > > > > > > (incubating) > > > > > > > > > > version > > > > > > > > > > > > > 1.6.0. > > > > > > > > > > > > > > > > > > Voting starts today and will close on Monday > > > > > > > > > > > > > > > > > > 2/3/2020 > > > > > > > > > 23:59 > > > > > > > > > > PST. > > > > > > > > > > > > > > > > > > Link to release notes: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes > > > > > > > > > > > > > > > > > > Link to release candidate: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2 > > > > > > > > > > > > > > > > > > Link to source and signatures on apache dist > > > > > > > > > > > > > > > > > > server: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/ > > > > > > > > > > > > > > > > > > The differences comparing to previous > > > > > > > > > > > > > > > > > > release > > > > > > > > > > > > > > > > > > candidate > > > > > > > > > > > > 1.6.0.rc1: > > > > > > > > > > > > > > > > > > * Fixes for license issues (#17361, #17375, > > > > > > > > > > > > > > > > > > #17370, > > > > > > > > > #17460) > > > > > > > > > > > > > > > > > > * Bugfix for saving LSTM layer parameter > > > > > > > > > > > > > > > > > > (#17288) > > > > > > > > > > > > > > > > > > * Bugfix for downloading the model from > > > > > > > > > > > > > > > > > > model > > > > > > > > > > > > > > > > > > zoo > > > > > > > > > > > > > > > > > > from > > > > > > > > > > multiple > > > > > > > > > > > > > > > > processes > > > > > > > > > > > > > > > > > > (#17372) > > > > > > > > > > > > > > > > > > * Fixed a symbol.py in AMP for GluonNLP > > > > > > > > > > > > > > > > > > (#17408) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please remember to TEST first before voting > > > > > > > > > > > > > > > > > > accordingly: > > > > > > > > > > > > > > > > > > +1 = approve > > > > > > > > > > > > > > > > > > +0 = no opinion > > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > Przemyslaw Tredak > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > *Chaitanya Prakash Bapat* > > > > > > > > > > > > *+1 (973) 953-6299* > > > > > > > > > > > > > > > > > > > > > > > > [image: https://www.linkedin.com//in/chaibapat25] > > > > > > > > > > > > <https://github.com/ChaiBapchya>[image: > > > > > > > > > > https://www.facebook.com/chaibapat > > > > > > > > > > > > ] > > > > > > > > > > > > <https://www.facebook.com/chaibapchya>[image: > > > > > > > > > > > > https://twitter.com/ChaiBapchya] < > > > > > > > > > > > > https://twitter.com/ChaiBapchya > > > > > > > > > > > [image: > > > > > > > > > > > > https://www.linkedin.com//in/chaibapat25] > > > > > > > > > > > > <https://www.linkedin.com//in/chaibapchya/> > > > > > > > > > > > > >