RuRo edited a comment on issue #18090: Aborted unix-gpu CI URL: https://github.com/apache/incubator-mxnet/issues/18090#issuecomment-616019010 Okay! I think I was able to track down the source of this problem. I noticed, that the last 3 times the pipeline froze for me, the last output was `test_operator_gpu.test_np_true_divide`. The next test after that, that is supposed to run is `test_operator_gpu.test_np_unary_bool_funcs`. After ducking around with `test_operator_gpu.test_np_unary_bool_funcs` a bit, I was able to reproduce the hanging behaviour locally. I just ran the test in a loop, printing something in between runs and after a while, the process got stuck and stopped printing. It didn't happen every time and setting PRNG seeds didn't make it fail deterministically, so the issue is probably thread-related. Also, it definitely doesn't happen with `MXNET_ENGINE_TYPE=NaiveEngine`. After making a reproducible setup, I spent some time bisecting the `test_operator_gpu.test_np_unary_bool_funcs` test to narrow down the offending operator. In the end, I was able to narrow it down to the `np.empty_like` operator: ```python import mxnet as mx import sys with mx.Context(mx.gpu()): while True: for _ in range(100): mx_data = mx.np.array(0) mx_x = mx.np.empty_like(mx_data) sys.stdout.buffer.write(b'.') sys.stdout.buffer.flush() ``` The above piece of code executes the `empty_like` operator and prints a dot once every 100 iterations and after ~20 seconds it gets stuck. I took a quick look at how `empty_like` is implemented but wasn't able to figure out, what exactly is wrong with it. It looks like a really thin wrapper around `CustomOp`, which is used from C++, so I am giving up for now.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
