Dear MXNet Community,

I recently found the NaN errors sometimes could be due to some
divide-by-zero float number bugs in engine backend. However, by default,
such an exception will not be thrown. I added a signal trap to catch this
error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a
few exceptions when running the python unit test. But this only works for
Linux OS.

I would like to get more feedback on the best practice to catch such bugs
in the code and if we should enforce such checks in CI. Any comment is
appreciated.

Best Regards,

Lin

Reply via email to