I created the following PR to disable the test: Disable flaky test test_operator.test_dropout (v1.3.x) https://github.com/apache/incubator-mxnet/pull/13200
The second failure I suppose is related to: distributed kvstore bug in MXNet https://github.com/apache/incubator-mxnet/issues/12713 Which partially was fixed by Set correct update on kvstore flag in dist_device_sync mode (v1.3.x) https://github.com/apache/incubator-mxnet/pull/13121 But another part of the issue is still open and does not have a fix yet: "When distributed kvstore is used, by default gluon.Trainer doesn't work with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more specific, the trainer updates once per GPU, the LRScheduler object is shared across GPUs and get a wrong update count." Best Anton пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <mecher...@gmail.com>: > In case the tests for MACOS will time out as well we can disable them and > keep at least the build stage as in: > > Disable travis tests > https://github.com/apache/incubator-mxnet/pull/13137 > > Best > Anton > > пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <mecher...@gmail.com>: > >> >> Hi Naveen, >> >> I believe that the timeout is not an issue for the branch. And I see >> great benefit in having tests for MACOS on the release branch. The travis >> build is not blocking anyway, so I don't see any risk in adding it. >> >> * test_dropout >> >> Currently, there is a problem with test_dropout that fails consistently >> on the branch: >> >> >> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline >> >> Error reported: >> >> ====================================================================== >> FAIL: test_operator.test_dropout >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, >> in runTest >> self.test(*self.arg) >> File >> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py", >> line 173, in test_new >> orig_test(*args, **kwargs) >> File >> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", >> line 5853, in test_dropout >> check_dropout_ratio(0.0, shape) >> File >> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", >> line 5797, in check_dropout_ratio >> assert exe.outputs[0].asnumpy().min() == min_value >> AssertionError: >> -------------------- >> begin captured logging << -------------------- >> common: INFO: Setting test np/mx/python random seeds, use >> MXNET_TEST_SEED=428273587 to reproduce. >> --------------------- >> end captured logging << --------------------- >> >> The test is enabled on master: >> >> Re-enables test_operator.test_dropout >> https://github.com/apache/incubator-mxnet/pull/12717 >> >> And there are no failures for it [1]. >> >> * KVStore tests >> >> Unfortunately, KVStore tests fail as well. >> >> >> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline >> >> Error reported: >> >> AssertionError >> test_gluon_trainer_type() >> assert trainer._update_on_kvstore is update_on_kv\ >> File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type >> >> If nobody has a fix for these issues, I will disable the tests and add >> information to the known issues section. >> >> Best >> Anton >> >> [1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/ >> >> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mnnav...@gmail.com>: >> >>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch >>> since >>> travis CI is timing out and creates blockers, it also did not exist for >>> v1.3.0. >>> >>> >>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <mecher...@gmail.com> >>> wrote: >>> >>> > A PR to fix the tests: >>> > >>> > Remove test for non existing index copy operator (v1.3.x) >>> > https://github.com/apache/incubator-mxnet/pull/13180 >>> > >>> > >>> > Best >>> > Anton >>> > >>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <mecher...@gmail.com>: >>> > >>> > > An addition has been made to include MacOS tests for the v1.3.x >>> branch: >>> > > >>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x) >>> > > https://github.com/apache/incubator-mxnet/pull/13179 >>> > > >>> > > It includes following PR's for master: >>> > > >>> > > [MXNET-908] Enable minimal OSX Travis build >>> > > https://github.com/apache/incubator-mxnet/pull/12462 >>> > > >>> > > [MXNET-908] Enable python tests in Travis >>> > > https://github.com/apache/incubator-mxnet/pull/12550 >>> > > >>> > > [MXNET-968] Fix MacOS python tests >>> > > https://github.com/apache/incubator-mxnet/pull/12590 >>> > > >>> > > >>> > > Best >>> > > Anton >>> > > >>> > > >>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <mecher...@gmail.com>: >>> > > >>> > >> Thank you everyone for your support and suggestions. All proposed >>> PR's >>> > >> have been merged. We will tag the release candidate and start the >>> vote >>> > on >>> > >> Friday, the 9th of November 2018. >>> > >> >>> > >> Unfortunately after the merges the tests started to fail: >>> > >> >>> > >> >>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/ >>> > >> >>> > >> I will look into the failures, but any help as usual is very >>> > appreciated. >>> > >> >>> > >> The nightly tests are fine: >>> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/ >>> > >> >>> > >> >>> > >> Best >>> > >> Anton >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <mecher...@gmail.com>: >>> > >> >>> > >>> Yes, you are right about the versions wording, thanks for >>> > clarification. >>> > >>> >>> > >>> A performance improvement can be considered a bugfix as well. I >>> see no >>> > >>> big risks in including PR's by Haibin and Lin into the patch >>> release. >>> > >>> >>> > >>> @Haibin, if you can reopen the PR's they should be good to go for >>> the >>> > >>> relase, considering the importance of the improvements. >>> > >>> >>> > >>> I propose the following bugfixes for the release as well (already >>> > >>> created corresponding PR's): >>> > >>> >>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x) >>> > >>> https://github.com/apache/incubator-mxnet/pull/13157 >>> > >>> >>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x) >>> > >>> https://github.com/apache/incubator-mxnet/pull/13158 >>> > >>> >>> > >>> We will be starting to merge the PR's shortly. If are no more >>> proposals >>> > >>> for backporting I would consider the list as set. >>> > >>> >>> > >>> Best >>> > >>> Anton >>> > >>> >>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <szha....@gmail.com>: >>> > >>> >>> > >>>> Hi Anton, >>> > >>>> >>> > >>>> I hear your concern about a simultaneous 1.4.0 release and it >>> > certainly >>> > >>>> is a valid one. >>> > >>>> >>> > >>>> Regarding the release, let’s agree on the language first. >>> According to >>> > >>>> semver.org, 1.3.1 release is considered patch release, which is >>> for >>> > >>>> backward compatible bug fixes, while 1.4.0 release is considered >>> minor >>> > >>>> release, which is for backward compatible new features. A major >>> > release >>> > >>>> would mean 2.0. >>> > >>>> >>> > >>>> The three PRs suggested by Haibin and Lin are all introducing new >>> > >>>> features. If they go into a patch release, it would require an >>> > exception >>> > >>>> accepted by the community. Also, if other violation happens it >>> could >>> > be >>> > >>>> ground for declining a release during votes. >>> > >>>> >>> > >>>> -sz >>> > >>>> >>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <mecher...@gmail.com> >>> > >>>> wrote: >>> > >>>> > >>> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution >>> layers >>> > >>>> >>> > >>> >>> > >>> >>