barry-jin opened a new issue #19420: URL: https://github.com/apache/incubator-mxnet/issues/19420
## Description 1. Run GluonNLP [full suite of tests](https://github.com/dmlc/gluon-nlp/tree/master/tests) with `pytest` on `mxnet-cu102==2.0.0b20201022` will introduce threading error (see Error Message). 2. But run full suite of tests on `mxnet-cu102==2.0.0b20201016` will not introduce this error. 3. Also, run these tests separately will not introduce this error. ### Error Message <details> <summary>Run GluonNLP pytest on `mxnet-cu102==2.0.0b20201022`</summary> ``` [2020-10-22T21:15:51.430Z] ============================= test session starts ============================== [2020-10-22T21:15:51.430Z] platform linux -- Python 3.6.9, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 [2020-10-22T21:15:51.432Z] rootdir: /workspace/gluon-nlp, configfile: pytest.ini [2020-10-22T21:15:51.432Z] plugins: cov-2.10.1 [2020-10-22T21:15:52.426Z] collected 1283 items [2020-10-22T21:16:01.630Z] tests/test_attention_cell.py ........................................... [ 3%] [2020-10-22T21:16:06.668Z] ...................................................................... [ 8%] [2020-10-22T21:16:06.796Z] tests/test_data_batchify.py ............................................ [ 12%] [2020-10-22T21:16:21.672Z] ................................. [ 14%] [2020-10-22T21:16:30.051Z] tests/test_data_filtering.py ..... [ 15%] [2020-10-22T21:16:36.895Z] tests/test_data_loading.py . [ 15%] [2020-10-22T21:16:37.213Z] tests/test_data_sampler.py ............................................. [ 18%] [2020-10-22T21:16:38.566Z] ........................................................................ [ 24%] [2020-10-22T21:16:40.003Z] ........................................................................ [ 30%] [2020-10-22T21:16:40.579Z] ........................................................................ [ 35%] [2020-10-22T21:16:41.143Z] ........................................................................ [ 41%] [2020-10-22T21:16:42.040Z] ........................................................................ [ 46%] [2020-10-22T21:16:42.299Z] ............... [ 48%] [2020-10-22T21:18:34.088Z] tests/test_data_tokenizers.py .............. [ 49%] [2020-10-22T21:18:34.095Z] tests/test_data_vocab.py . [ 49%] [2020-10-22T21:22:22.268Z] tests/test_embedding.py .. [ 49%] [2020-10-22T21:22:59.289Z] tests/test_gluon_block.py ..... [ 49%] [2020-10-22T21:22:59.328Z] tests/test_initializer.py ... [ 49%] [2020-10-22T21:23:00.225Z] tests/test_layers.py ........................... [ 52%] [2020-10-22T21:23:00.312Z] tests/test_loss.py ........................ [ 53%] [2020-10-22T21:37:39.851Z] tests/test_models.py ................................................ [ 57%] [2020-10-22T21:38:46.438Z] tests/test_models_albert.py ................. [ 59%] [2020-10-22T21:39:38.599Z] tests/test_models_bart.py ...... [ 59%] [2020-10-22T21:44:18.743Z] tests/test_models_bert.py ............ [ 60%] [2020-10-22T21:46:00.142Z] tests/test_models_electra.py ........ [ 61%] [2020-10-22T21:49:47.086Z] tests/test_models_gpt2.py .......F [ 61%] [2020-10-22T21:49:57.226Z] tests/test_models_mobilebert.py ..... [ 62%] [2020-10-22T21:51:27.552Z] tests/test_models_roberta.py ....FF [ 62%] [2020-10-22T21:52:10.783Z] tests/test_models_transformer.py ....................................... [ 65%] [2020-10-22T21:53:33.876Z] ........................................................................ [ 71%] [2020-10-22T21:54:26.540Z] ..........................................FFFFF [ 74%] [2020-10-22T21:54:34.975Z] tests/test_models_transformer_xl.py ...... [ 75%] [2020-10-22T21:55:47.820Z] tests/test_models_xlmr.py .FF [ 75%] [2020-10-22T21:55:48.122Z] tests/test_op.py ....................................................... [ 79%] [2020-10-22T21:55:48.754Z] ........................................................................ [ 85%] [2020-10-22T21:55:49.195Z] .... [ 85%] [2020-10-22T21:56:20.712Z] tests/test_optimizer.py . [ 85%] [2020-10-22T21:56:20.716Z] tests/test_pytest.py . [ 85%] [2020-10-22T21:56:21.005Z] tests/test_sequence_sampler.py ......................................... [ 89%] [2020-10-22T21:56:21.522Z] ........................................................................ [ 94%] [2020-10-22T21:56:33.345Z] ....................................... [ 97%] [2020-10-22T21:56:33.590Z] Fatal Python error: Aborted [2020-10-22T21:56:33.590Z] Thread 0x00007f92b9fff700 (most recent call first): [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/threading.py", line 299 in wait [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/threading.py", line 551 in wait [2020-10-22T21:56:33.590Z] File "/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 59 in run [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap [2020-10-22T21:56:33.590Z] Current thread 0x00007f9457153740 (most recent call first): [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66 in _launch [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19 in __init__ [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/multiprocessing/context.py", line 277 in _Popen [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/multiprocessing/process.py", line 105 in start [2020-10-22T21:56:33.590Z] File "/usr/lib/python3.6/multiprocessing/pool.py", line 239 in _repopulate_pool [2020-10-22T21:56:33.591Z] File "/usr/lib/python3.6/multiprocessing/pool.py", line 174 in __init__ [2020-10-22T21:56:33.591Z] File "/usr/lib/python3.6/multiprocessing/context.py", line 119 in Pool [2020-10-22T21:56:33.591Z] File "/workspace/gluon-nlp/tests/test_utils_misc.py", line 87 in verify_download [2020-10-22T21:56:33.591Z] File "/workspace/gluon-nlp/tests/test_utils_misc.py", line 102 in test_download_s3 [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/_pytest/python.py", line 184 in pytest_pyfunc_call [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda> [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__ [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/_pytest/python.py", line 1627 in runtest [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 163 in pytest_runtest_call [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall [2020-10-22T21:56:33.591Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda> [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__ [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 256 in <lambda> [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 310 in from_call [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 256 in call_runtest_hook [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 216 in call_and_report [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 127 in runtestprotocol [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/_pytest/runner.py", line 110 in pytest_runtest_protocol [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda> [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec [2020-10-22T21:56:33.592Z] File "/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__ [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 338 in pytest_runtestloop [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda> [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__ [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 313 in _main [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 257 in wrap_session [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/_pytest/main.py", line 306 in pytest_cmdline_main [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda> [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec [2020-10-22T21:56:33.593Z] File "/root/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__ [2020-10-22T21:56:33.594Z] File "/root/.local/lib/python3.6/site-packages/_pytest/config/__init__.py", line 165 in main [2020-10-22T21:56:33.594Z] File "/root/.local/lib/python3.6/site-packages/_pytest/config/__init__.py", line 187 in console_main [2020-10-22T21:56:33.594Z] File "/root/.local/lib/python3.6/site-packages/pytest/__main__.py", line 5 in <module> [2020-10-22T21:56:33.594Z] File "/usr/lib/python3.6/runpy.py", line 85 in _run_code [2020-10-22T21:56:33.594Z] File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main [2020-10-22T22:00:07.664Z] ./gluon_nlp_job.sh: line 39: 44 Aborted (core dumped) /bin/bash -o pipefail -c "$COMMAND" ``` </details> ## To Reproduce ``` Compute Environment: Instance type: g4dn.4x vCPUs: 16 $ python3 -m pip install -U --quiet --pre "mxnet-cu102==2.0.0b20201022" -f https://dist.mxnet.io/python $ git remote set-url origin https://github.com/dmlc/gluon-nlp $ git fetch origin master:working $ git checkout working $ python3 -m pip install --quiet -e .[extras] $ python3 -m pytest --cov=. --cov-config=./.coveragerc --cov-report=xml --durations=50 --device="gpu" --runslow ./tests/ ``` ## What have you tried to solve it? Some observations: 1. The failed tests all use `mx.npx.waitall()` 2. The test failed on `multiprocessing.Pool()` ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ``` Instance type: g4dn.4x MXNet version: mxnet-cu102==2.0.0b20201022 python version: 3.6.9 CUDNN_VERSION: 7.6.5.32 CUDA_VERSION: 10.2.89 NCCL_VERSION: 2.7.8 ``` </details> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
