Dear MXNet Community, I recently found the NaN errors sometimes could be due to some divide-by-zero float number bugs in engine backend. However, by default, such an exception will not be thrown. I added a signal trap to catch this error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a few exceptions when running the python unit test. But this only works for Linux OS.
I would like to get more feedback on the best practice to catch such bugs in the code and if we should enforce such checks in CI. Any comment is appreciated. Best Regards, Lin