Hi Marco,

Is auto scaling already enabled on mxnet apache CI, or this is only happens
on your setup? I see the test is using scipy. Do both environments have the
same version of scipy installed?

I recently see lots of test failures on mxnet master. One thing on my wish
list is a database which stores all the occurrences of test failures and
their commit ids, which would be very helpful for initial diagnosing what
code changes potentially introduced bugs. Otherwise clicking all past tests
and reading those logs requires a lot of manual work.

Best,
Haibin

On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <marco.g.ab...@googlemail.com
> wrote:

> Hello,
>
> I'm currently working on auto scaling and encountering a consistent test
> failure on CPU. At the moment, I'm not really sure what's causing this,
> considering the setup should be identical.
>
> http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/ci-master/557/pipeline/694
>
> ======================================================================
>
> FAIL: test_sparse_operator.test_sparse_mathematical_core
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
>   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in
> runTest
>
>     self.test(*self.arg)
>
>   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
> test_new
>
>     orig_test(*args, **kwargs)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1084, in test_sparse_mathematical_core
>
>     density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1056, in check_mathematical_core
>
>     density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 698, in check_sparse_mathematical_core
>
>     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
>
>   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
> assert_almost_equal
>
>     raise AssertionError(msg)
>
> AssertionError:
>
> Items are not equal:
>
> Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
> maximum error:(0, 0), a=inf, b=-inf
>
>  a: array([[inf],
>
>        [inf],
>
>        [inf],...
>
>  b: array([[-inf],
>
>        [-inf],
>
>        [-inf],...
>
> -------------------- >> begin captured stdout << ---------------------
>
> pass 0
>
> 0.0, 0.0, False
>
> --------------------- >> end captured stdout << ----------------------
>
> -------------------- >> begin captured logging << --------------------
>
> common: INFO: Setting test np/mx/python random seeds, use
> MXNET_TEST_SEED=2103230797 to reproduce.
>
> --------------------- >> end captured logging << ---------------------
>
>
> Does this ring any bells?
>
> Thanks in advance!
>
> -Marco
>

Reply via email to