ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. @leezu  and I just see your comment above. That's actually 
happening before test_profiler is failing.
   I'm looking at the newer test run and notice the following: The timeout you 
describe above (from the previous test run) is happening around 3%. Looking at 
the newer test run, I'm also seeing a timeout, however this time its happening 
at around 51%. However, I think it's still related to the dataloader. In the 
newer test run we see the same Timeout, immediately followed by the log line 
"PASSED tests/python/unittest/test_gluon_data.py::test_dataloader_context" (so 
somehow the dataloader test is still passing!?).
   ```
   [2020-07-02T19:44:30.029Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-mergesort]
 
   [2020-07-02T19:44:30.286Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.286Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.543Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] [gw1] [ 51%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] Timeout (0:20:00)!
   [2020-07-02T19:44:30.798Z] Thread 0x00007fd3c0475700 (most recent call 
first):
   ...
   ...
   ...
   [2020-07-02T19:44:39.208Z] [gw0] [ 51%] PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   We see that the dataloader test was started much earlier (at around 2%):
   ```
   [2020-07-02T19:24:39.154Z] [gw0] [  2%] PASSED 
tests/python/unittest/test_gluon_data.py::test_multi_worker 
   [2020-07-02T19:24:39.154Z] 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   Maybe I can isolate this timeout somehow.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to