Thanks Haibin, it was in fact related to scipy. The latest version 1.1.0 causes this test to fail. Switching back to 1.0.1 makes the test pass. I have created an issue at [1].
I'm not familiar with Scipy and don't really know where to start for creating a fix. For now, I have submitted a PR [2] to pin the version to 1.0.1. None the less, it would be good if we could get this fixed. Could somebody assist me here? Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/issues/10901 [2]: https://github.com/apache/incubator-mxnet/pull/10902 On Wed, May 9, 2018 at 11:08 PM, Marco de Abreu < [email protected]> wrote: > Hi Haibin, > > auto scaling is currently not enabled on MXNet Apache CI. This only > happens on my test environment. Thanks for the hint with Scipy, I will > definitely look into this! > > That's a good idea. I have spoken to Steffen in the last days and we > brainstormed some ideas how to handle test failures. We will let the > community know if we have a more detailed plan. > > Best regards, > Marco > > On Wed, May 9, 2018 at 7:19 PM, Haibin Lin <[email protected]> > wrote: > >> Hi Marco, >> >> Is auto scaling already enabled on mxnet apache CI, or this is only >> happens >> on your setup? I see the test is using scipy. Do both environments have >> the >> same version of scipy installed? >> >> I recently see lots of test failures on mxnet master. One thing on my wish >> list is a database which stores all the occurrences of test failures and >> their commit ids, which would be very helpful for initial diagnosing what >> code changes potentially introduced bugs. Otherwise clicking all past >> tests >> and reading those logs requires a lot of manual work. >> >> Best, >> Haibin >> >> On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu < >> [email protected] >> > wrote: >> >> > Hello, >> > >> > I'm currently working on auto scaling and encountering a consistent test >> > failure on CPU. At the moment, I'm not really sure what's causing this, >> > considering the setup should be identical. >> > >> > http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/ >> > incubator-mxnet/detail/ci-master/557/pipeline/694 >> > >> > ====================================================================== >> > >> > FAIL: test_sparse_operator.test_sparse_mathematical_core >> > >> > ---------------------------------------------------------------------- >> > >> > Traceback (most recent call last): >> > >> > File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line >> 198, in >> > runTest >> > >> > self.test(*self.arg) >> > >> > File "/work/mxnet/tests/python/unittest/common.py", line 157, in >> > test_new >> > >> > orig_test(*args, **kwargs) >> > >> > File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", >> line >> > 1084, in test_sparse_mathematical_core >> > >> > density=density, ograd_density=ograd_density) >> > >> > File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", >> line >> > 1056, in check_mathematical_core >> > >> > density=density, ograd_density=ograd_density) >> > >> > File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", >> line >> > 698, in check_sparse_mathematical_core >> > >> > assert_almost_equal(arr_grad, input_grad, equal_nan=True) >> > >> > File "/work/mxnet/python/mxnet/test_utils.py", line 493, in >> > assert_almost_equal >> > >> > raise AssertionError(msg) >> > >> > AssertionError: >> > >> > Items are not equal: >> > >> > Error nan exceeds tolerance rtol=0.000010, atol=0.000000. Location of >> > maximum error:(0, 0), a=inf, b=-inf >> > >> > a: array([[inf], >> > >> > [inf], >> > >> > [inf],... >> > >> > b: array([[-inf], >> > >> > [-inf], >> > >> > [-inf],... >> > >> > -------------------- >> begin captured stdout << --------------------- >> > >> > pass 0 >> > >> > 0.0, 0.0, False >> > >> > --------------------- >> end captured stdout << ---------------------- >> > >> > -------------------- >> begin captured logging << -------------------- >> > >> > common: INFO: Setting test np/mx/python random seeds, use >> > MXNET_TEST_SEED=2103230797 to reproduce. >> > >> > --------------------- >> end captured logging << --------------------- >> > >> > >> > Does this ring any bells? >> > >> > Thanks in advance! >> > >> > -Marco >> > >> > >
