xidulu opened a new issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198
## Description
Ran GPU unit tests
`DMLC_LOG_STACK_TRACE_DEPTH=10 MXNET_MODULE_SEED=781106105
MXNET_ENGINE_TYPE=NaiveEngine pytest tests/python/gpu/test_operator_gpu.py`
### Error Message
```
tests/python/gpu/test_operator_gpu.py .........s.s...................... [
5%]
.........................FFFFFsF.FFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFF [
16%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [
28%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFssFFFFFF [
39%]
FFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFsFFFFFFFsFFFsFFFFFFFFFFF [
50%]
FFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFFFFFFF [
62%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFsFFFFFF.FFFFFFFFFFFF [
73%]
FFFFFFFFFFFFFFFFFFF....F.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF....FFFFFFFFFFFFF [
84%]
FFFxxxFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [
96%]
FFFFFFFFFFFFFFFFFFFFFFFF
[100%]
=================================== FAILURES
===================================
_______________________ test_batchnorm_backwards_notrain
_______________________
args = (), kwargs = {}, test_count = 1, env_seed_str = None, i = 0
this_test_seed = 1871614074, log_level = 10
post_test_state = ('MT19937', array([ 793462385, 4162567913, 2690816661,
3146259572, 1379942102,
894119658, 364406528, 36749442..., 3314795127, 3420630909,
2538379262,
3698999054, 2822638424, 471751221, 3037373484], dtype=uint32), 1, 0,
0.0)
@functools.wraps(orig_test)
def test_new(*args, **kwargs):
test_count = int(os.getenv('MXNET_TEST_COUNT', '1'))
env_seed_str = os.getenv('MXNET_TEST_SEED')
for i in range(test_count):
if seed is not None:
this_test_seed = seed
log_level = logging.INFO
elif env_seed_str is not None:
this_test_seed = int(env_seed_str)
log_level = logging.INFO
else:
this_test_seed = np.random.randint(0, np.iinfo(np.int32).max)
log_level = logging.DEBUG
post_test_state = np.random.get_state()
np.random.seed(this_test_seed)
> mx.random.seed(this_test_seed)
tests/python/unittest/common.py:206:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _
python/mxnet/random.py:96: in seed
check_call(_LIB.MXRandomSeed(seed_state))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _
ret = -1
def check_call(ret):
"""Check the return value of C API call.
This function will raise an exception when an error occurs.
Wrap every API call with this function.
Parameters
----------
ret : int
return value from API calls.
"""
if ret != 0:
> raise get_last_ffi_error()
E mxnet.base.MXNetError: Traceback (most recent call last):
E [bt] (5)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(MXRandomSeed+0x1a)
[0x7f89ce0a0c0a]
E [bt] (4)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::resource::ResourceManagerImpl::SeedRandom(unsigned
int)+0x30b) [0x7f89d11b081b]
E [bt] (3)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void
(mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context,
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&,
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&,
mxnet::FnProperty, int, char const*, bool)+0x43b) [0x7f89ce1d08eb]
E [bt] (2)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void
(mxnet::RunContext, mxnet::engine::CallbackOnComplete),
mxnet::resource::ResourceManagerImpl::ResourceParallelRandom<mshadow::gpu>::SeedOne(unsigned
long, unsigned int)::{lambda(mxnet::RunContext,
mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&,
mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x1e) [0x7f89d11ac6ce]
E [bt] (1)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::common::random::RandGenerator<mshadow::gpu,
float>::Seed(mshadow::Stream<mshadow::gpu>*, unsigned int)+0x1e9)
[0x7f89d1240f55]
E [bt] (0)
/home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7f)
[0x7f89cdf9b24f]
E File "../src/common/random_generator.cu", line 58
E Name: Check failed: err == cudaSuccess (10 vs. 0) :
rand_generator_seed_kernel ErrStr:invalid device ordinal
python/mxnet/base.py:246: MXNetError
---------------------------- Captured stderr setup
-----------------------------
WARNING:root:Unable to import numpy/mxnet. Skipping seeding for numpy/mxnet.
------------------------------ Captured log setup
------------------------------
WARNING root:conftest.py:177 Unable to import numpy/mxnet. Skipping seeding
for numpy/mxnet.
____________________ test_create_sparse_ndarray_gpu_to_cpu
_____________________
```
## To Reproduce
(If you developed your own code, please provide a short script that
reproduces the error. For existing examples, please provide link.)
### Steps to reproduce
(Paste the commands you ran that produced the error.)
1.
2.
## What have you tried to solve it?
1.
2.
## Environment
We recommend using our script for collecting the diagnositc information. Run
the following command and paste the outputs below:
```
curl --retry 10 -s
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py |
python
# paste outputs here
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]