juliusshufan commented on issue #10921: [MXNET-500]Test cases improvement for MKLDNN on Gluon URL: https://github.com/apache/incubator-mxnet/pull/10921#issuecomment-403206778 @marcoabreu Sorry for no update on this PR for a long time. I did some investigations and reproduce the same GPU tests on a NVidia P40 machine, and I think the previous failured reported from this CI is caused out of memory (though some of failures are "illegal memory"). I therefore modify my cases to reduce the tensor size, by using a smaller input shape, channel number, batch size etc. And on a P40 GPU, all the GPU cases, including test_gluon and the cases under tests/python/gpu passed. Now you can see, the latest CI results show much fewer failed cases than before, though it still failed due to "out of memory". I might keep on reducing the tensor size to avoid the out of memory on the MXNET preci environment, while with the cases increase, eventually the out of memory will happen. As we dicussed before, I agreed that it is better that test cases not bind to a backend, while these cases targeting the MKLDNN related layout and memory reorder, per the current ciucumstance, may I suggest to executed only on CPU? These cases are helpful to track the regressions on MKLDNN integration. @zheng-da @TaoLv May I have your comments? Thanks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
