Actually I have a linking problem in my ubuntu desktop that is fixed in master:
lc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int>, std::allocator<dmlc::data::RowBlockContainer<unsigned int> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int>, std::allocator<dmlc::data::RowBlockContainer<unsigned int> > >**)>, std::function<void ()>)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' 3rdparty/dmlc-core/libdmlc.a(data.cc.o): In function `std::thread::thread<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned long>, std::allocator<dmlc::data::RowBlockContainer<unsigned long> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned long>, std::allocator<dmlc::data::RowBlockContainer<unsigned long> > >**)>, std::function<void ()>)::{lambda()#1}&>(dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned long>, std::allocator<dmlc::data::RowBlockContainer<unsigned long> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned long>, std::allocator<dmlc::data::RowBlockContainer<unsigned long> > >**)>, std::function<void ()>)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' 3rdparty/dmlc-core/libdmlc.a(data.cc.o): In function `std::thread::thread<dmlc::ThreadedIter<dmlc::data::RowBlockContainer<unsigned int> >::Init(std::function<bool (dmlc::data::RowBlockContainer<unsigned int>**)>, std::function<void ()>)::{lambda()#1}&>(dmlc::ThreadedIter<dmlc::data::RowBlockContainer<unsigned int> >::Init(std::function<bool (dmlc::data::RowBlockContainer<unsigned int>**)>, std::function<void ()>)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' 3rdparty/dmlc-core/libdmlc.a(data.cc.o): In function `std::thread::thread<dmlc::ThreadedIter<dmlc::data::RowBlockContainer<unsigned long> >::Init(std::function<bool (dmlc::data::RowBlockContainer<unsigned long>**)>, std::function<void ()>)::{lambda()#1}&>(dmlc::ThreadedIter<dmlc::data::RowBlockContainer<unsigned long> >::Init(std::function<bool (dmlc::data::RowBlockContainer<unsigned long>**)>, std::function<void ()>)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' 3rdparty/dmlc-core/libdmlc.a(io.cc.o): In function `std::thread::thread<dmlc::ThreadedIter<dmlc::io::InputSplitBase::Chunk>::Init(std::function<bool (dmlc::io::InputSplitBase::Chunk**)>, std::function<void ()>)::{lambda()#1}&>(dmlc::ThreadedIter<dmlc::io::InputSplitBase::Chunk>::Init(std::function<bool (dmlc::io::InputSplitBase::Chunk**)>, std::function<void ()>)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Can we update dmlc-core on the release branch? this was recently fixed: https://github.com/dmlc/dmlc-core/commit/b744643f386660ddc39467a04e3a98853a7419b9 On Sat, May 5, 2018 at 11:59 AM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Hi > > Looks like only gluon test lambda is failing intermittently, but looks > like a minor numerical issue. > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/ > jenkins/incubator-mxnet/detail/v1.2.0/20/pipeline > > I triggered a few builds yesterday and they all passed. I think Anirudh is > right. > > Changing my vote to +1 (non binding). > > > Pedro. > > > > On Sat, May 5, 2018 at 12:10 AM, Jun Wu <wujun....@gmail.com> wrote: > >> +1 >> I built from source and ran all the model quantization examples >> successfully. >> >> On Fri, May 4, 2018 at 3:05 PM, Anirudh <anirudh2...@gmail.com> wrote: >> >> > Hi Pedro, Haibin, Indhu, >> > >> > Thank you for your inputs on the release. I ran the test: >> > `test_module.py:test_forward_reshape` for 250k times with different >> seeds. >> > I was unable to reproduce the issue on the release branch. >> > If everything goes well with CI tests by Pedro running till Sunday, I >> think >> > we should move forward with the release (given that we have enough +1s). >> > Is it possible to trigger the CI on the 1.2 branch repeatedly or at a >> fixed >> > schedule till Sunday? >> > >> > Anirudh >> > >> > On Fri, May 4, 2018 at 11:56 AM, Indhu <indhubhara...@gmail.com> wrote: >> > >> > > +1 >> > > >> > > I've been using CUDA build from this branch (built from source) on >> Ubuntu >> > > for couple of days now and I haven't seen any issue. >> > > >> > > The flaky tests need to be fixed but this release need not be blocked >> for >> > > that. >> > > >> > > >> > > On Fri, May 4, 2018 at 11:32 AM, Haibin Lin <haibin.lin....@gmail.com >> > >> > > wrote: >> > > >> > > > I agree with Anirudh that the focus of the discussion should be >> limited >> > > to >> > > > the release branch, not the master branch. Anything that breaks on >> > master >> > > > but works on release branch should not block the release itself. >> > > > >> > > > >> > > > Best, >> > > > >> > > > Haibin >> > > > >> > > > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy < >> > > > pedro.larroy.li...@gmail.com> >> > > > wrote: >> > > > >> > > > > I see your point. >> > > > > >> > > > > I checked the failures on the v1.2.0 branch and I don't see >> > segfaults, >> > > > just >> > > > > minor failures due to flaky tests. >> > > > > >> > > > > I will trigger it repeatedly a few times until Sunday to have a >> and >> > > > change >> > > > > my vote accordingly. >> > > > > >> > > > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator- >> > mxnet/job/v1.2.0/ >> > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ >> > > > > incubator-mxnet/detail/v1.2.0/17/pipeline >> > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ >> > > > > incubator-mxnet/detail/v1.2.0/15/pipeline/ >> > > > > >> > > > > >> > > > > Pedro. >> > > > > >> > > > > On Fri, May 4, 2018 at 7:16 PM, Anirudh <anirudh2...@gmail.com> >> > wrote: >> > > > > >> > > > > > Hi Pedro, >> > > > > > >> > > > > > Thank you for the suggestions. I will try to reproduce this >> without >> > > > fixed >> > > > > > seeds and also run it for a longer time duration. >> > > > > > Having said that, running unit tests over and over for a couple >> of >> > > days >> > > > > > will likely cause >> > > > > > problems because there around 42 open issues for flaky tests: >> > > > > > https://github.com/apache/incubator-mxnet/issues?q=is% >> > > > > > 3Aopen+is%3Aissue+label%3AFlaky >> > > > > > Also, the release branch has diverged from master around 3 weeks >> > back >> > > > and >> > > > > > it doesn't have many of the changes merged to the master. >> > > > > > So, my question essentially is, what will be your benchmark to >> > accept >> > > > the >> > > > > > release ? >> > > > > > Is it that we run the test which you provided on 1.2 without >> fixed >> > > > seeds >> > > > > > and for a longer duration without failures ? >> > > > > > Or is it that all unit tests should pass over a period of 2 days >> > > > without >> > > > > > issues. This may require fixing all of the flaky tests which >> would >> > > > delay >> > > > > > the release by considerable amount of time. >> > > > > > Or is it something else ? >> > > > > > >> > > > > > Anirudh >> > > > > > >> > > > > > >> > > > > > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy < >> > > > > pedro.larroy.li...@gmail.com >> > > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Could you remove the fixed seeds and run it for a couple of >> hours >> > > > with >> > > > > an >> > > > > > > additional loop? Also I would suggest running the unit tests >> > over >> > > > and >> > > > > > over >> > > > > > > for a couple of days if possible. >> > > > > > > >> > > > > > > >> > > > > > > Pedro. >> > > > > > > >> > > > > > > On Thu, May 3, 2018 at 8:33 PM, Anirudh < >> anirudh2...@gmail.com> >> > > > wrote: >> > > > > > > >> > > > > > > > Hi Pedro and Naveen, >> > > > > > > > >> > > > > > > > I am unable to reproduce this issue with MKLDNN on the >> master >> > but >> > > > not >> > > > > > on >> > > > > > > > the 1.2.RC2 branch. >> > > > > > > > >> > > > > > > > Did the following on 1.2.RC2 branch: >> > > > > > > > >> > > > > > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas >> > > USE_DIST_KVSTORE=0 >> > > > > > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1 >> > > > > > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0 >> > > > > > > > export MXNET_TEST_SEED=11 >> > > > > > > > export MXNET_MODULE_SEED=812478194 >> > > > > > > > export MXNET_TEST_COUNT=10000 >> > > > > > > > nosetests-2.7 -v tests/python/unittest/test_ >> > > > > > > module.py:test_forward_reshape >> > > > > > > > >> > > > > > > > Was able to do the 10k runs successfully. >> > > > > > > > >> > > > > > > > Anirudh >> > > > > > > > >> > > > > > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh < >> anirudh2...@gmail.com >> > > >> > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi Pedro and Naveen, >> > > > > > > > > >> > > > > > > > > Is this issue reproducible when MXNet is built with >> > > USE_MKLDNN=0? >> > > > > > > > > Also, there are a bunch of MKLDNN fixes that didn't go >> into >> > the >> > > > > > release >> > > > > > > > > branch. Is this issue reproducible on the release branch ? >> > > > > > > > > In my opinion, since we have marked MKLDNN as experimental >> > > > feature >> > > > > > for >> > > > > > > > the >> > > > > > > > > release, if it is confirmed to be a MKLDNN issue >> > > > > > > > > we don't need to block the release on it. >> > > > > > > > > >> > > > > > > > > Anirudh >> > > > > > > > > >> > > > > > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy < >> > > mnnav...@gmail.com >> > > > > >> > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> Thanks for raising this issue Pedro. >> > > > > > > > >> >> > > > > > > > >> -1(binding) >> > > > > > > > >> >> > > > > > > > >> We were in a similar state for a while a year ago, a lot >> of >> > > > effort >> > > > > > > went >> > > > > > > > to >> > > > > > > > >> stabilize the tests and the CI. I have seen the PR builds >> > are >> > > > > > > > >> non-deterministic and you have to retry over and over >> > (wasting >> > > > > > > resources >> > > > > > > > >> and time) and hope you get lucky. >> > > > > > > > >> >> > > > > > > > >> Look at the dashboard for master build >> > > > > > > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator- >> > > > > > mxnet/job/master/ >> > > > > > > > >> >> > > > > > > > >> -Naveen >> > > > > > > > >> >> > > > > > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy < >> > > > > > > > >> pedro.larroy.li...@gmail.com> >> > > > > > > > >> wrote: >> > > > > > > > >> >> > > > > > > > >> > -1 nondeterminisitc failures on CI master: >> > > > > > > > >> > https://issues.apache.org/jira/browse/MXNET-396 >> > > > > > > > >> > >> > > > > > > > >> > Was able to reproduce once in a fresh p3 instance with >> > DLAMI >> > > > > > can't >> > > > > > > > >> > reproduce consistently. >> > > > > > > > >> > >> > > > > > > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh < >> > > > anirudh2...@gmail.com> >> > > > > > > > wrote: >> > > > > > > > >> > >> > > > > > > > >> > > Hi all, >> > > > > > > > >> > > >> > > > > > > > >> > > As part of RC2 release, we have addressed bugs and >> some >> > > > > concerns >> > > > > > > > that >> > > > > > > > >> > were >> > > > > > > > >> > > raised. >> > > > > > > > >> > > >> > > > > > > > >> > > I would like to propose a vote to release Apache >> MXNet >> > > > > > > (incubating) >> > > > > > > > >> > version >> > > > > > > > >> > > 1.2.0.RC2. Voting will start now (Wednesday, May 2nd) >> > and >> > > > end >> > > > > at >> > > > > > > > >> 12:50 PM >> > > > > > > > >> > > PDT, Sunday, May 6th. >> > > > > > > > >> > > >> > > > > > > > >> > > Link to release notes: >> > > > > > > > >> > > https://cwiki.apache.org/confluence/display/MXNET/ >> > > > > > > > >> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes >> > > > > > > > >> > > >> > > > > > > > >> > > Link to release candidate 1.2.0.rc2: >> > > > > > > > >> > > https://github.com/apache/incu >> bator-mxnet/releases/tag/ >> > > > > > 1.2.0.rc2 >> > > > > > > > >> > > >> > > > > > > > >> > > Voting results for 1.2.0.rc2: >> > > > > > > > >> > > https://lists.apache.org/thread.html/ >> > > > > > > ebe561c609a8e32351dfe4aafc8876 >> > > > > > > > >> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org >> %3E >> > > > > > > > >> > > >> > > > > > > > >> > > View this page, click on "Build from Source", and use >> > the >> > > > > source >> > > > > > > > code >> > > > > > > > >> > > obtained from 1.2.0.rc2 tag: >> > > > > > > > >> > > https://mxnet.incubator.apache >> .org/install/index.html >> > > > > > > > >> > > >> > > > > > > > >> > > (Note: The README.md points to the 1.2.0 tag and does >> > not >> > > > work >> > > > > > at >> > > > > > > > the >> > > > > > > > >> > > moment.) >> > > > > > > > >> > > >> > > > > > > > >> > > Please remember to test first before voting >> accordingly: >> > > > > > > > >> > > >> > > > > > > > >> > > +1 = approve >> > > > > > > > >> > > +0 = no opinion >> > > > > > > > >> > > -1 = disapprove (provide reason) >> > > > > > > > >> > > >> > > > > > > > >> > > Anirudh >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >