Hi Haibin,

auto scaling is currently not enabled on MXNet Apache CI. This only happens
on my test environment. Thanks for the hint with Scipy, I will definitely
look into this!

That's a good idea. I have spoken to Steffen in the last days and we
brainstormed some ideas how to handle test failures. We will let the
community know if we have a more detailed plan.

Best regards,
Marco

On Wed, May 9, 2018 at 7:19 PM, Haibin Lin <haibin.lin....@gmail.com> wrote:

> Hi Marco,
>
> Is auto scaling already enabled on mxnet apache CI, or this is only happens
> on your setup? I see the test is using scipy. Do both environments have the
> same version of scipy installed?
>
> I recently see lots of test failures on mxnet master. One thing on my wish
> list is a database which stores all the occurrences of test failures and
> their commit ids, which would be very helpful for initial diagnosing what
> code changes potentially introduced bugs. Otherwise clicking all past tests
> and reading those logs requires a lot of manual work.
>
> Best,
> Haibin
>
> On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <
> marco.g.ab...@googlemail.com
> > wrote:
>
> > Hello,
> >
> > I'm currently working on auto scaling and encountering a consistent test
> > failure on CPU. At the moment, I'm not really sure what's causing this,
> > considering the setup should be identical.
> >
> > http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/ci-master/557/pipeline/694
> >
> > ======================================================================
> >
> > FAIL: test_sparse_operator.test_sparse_mathematical_core
> >
> > ----------------------------------------------------------------------
> >
> > Traceback (most recent call last):
> >
> >   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198,
> in
> > runTest
> >
> >     self.test(*self.arg)
> >
> >   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
> > test_new
> >
> >     orig_test(*args, **kwargs)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 1084, in test_sparse_mathematical_core
> >
> >     density=density, ograd_density=ograd_density)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 1056, in check_mathematical_core
> >
> >     density=density, ograd_density=ograd_density)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 698, in check_sparse_mathematical_core
> >
> >     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
> >
> >   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
> > assert_almost_equal
> >
> >     raise AssertionError(msg)
> >
> > AssertionError:
> >
> > Items are not equal:
> >
> > Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
> > maximum error:(0, 0), a=inf, b=-inf
> >
> >  a: array([[inf],
> >
> >        [inf],
> >
> >        [inf],...
> >
> >  b: array([[-inf],
> >
> >        [-inf],
> >
> >        [-inf],...
> >
> > -------------------- >> begin captured stdout << ---------------------
> >
> > pass 0
> >
> > 0.0, 0.0, False
> >
> > --------------------- >> end captured stdout << ----------------------
> >
> > -------------------- >> begin captured logging << --------------------
> >
> > common: INFO: Setting test np/mx/python random seeds, use
> > MXNET_TEST_SEED=2103230797 to reproduce.
> >
> > --------------------- >> end captured logging << ---------------------
> >
> >
> > Does this ring any bells?
> >
> > Thanks in advance!
> >
> > -Marco
> >
>

Reply via email to