i have used mxnet (1.6.0) for face recogniton, but accidently it reports an 
error after 2 epochs during normal training:
 ```
Traceback (most recent call last):
 File "train_0723.py", line 455, in <module>
    main()
  File "train_0723.py", line 451, in main
    train_net(args)
  File "train_0723.py", line 445, in train_net
    epoch_end_callback=epoch_cb)
  File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 
573, in fit
    self.update()
  File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 
406, in update
    mx.nd.waitall()
  File 
"/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", 
line 200, in waitall
    check_call(_LIB.MXNDArrayWaitAll())
  File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/base.py", line 
255, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [03:32:38] 
/home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62:
 Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered
Stack trace:
  [bt] (0) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b41eb) 
[0x7f76131a51eb]
  [bt] (1) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37b2742)
 [0x7f76162a3742]
  [bt] (2) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37e3515)
 [0x7f76162d4515]
  [bt] (3) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37bf6d1)
 [0x7f76162b06d1]
  [bt] (4) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37c2c10)
 [0x7f76162b3c10]
  [bt] (5) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37c2ea6)
 [0x7f76162b3ea6]
  [bt] (6) 
/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37bde84)
 [0x7f76162aee84]
  [bt] (7) /home/user1/miniconda3/bin/../lib/libstdc++.so.6(+0xc8421) 
[0x7f76aca9d421]
  [bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f76bb1f0609]
```

 
i haven't got any clue to solve this error after googling, but only decrease my 
batch_size 400 to 360, and not sure whether it will encounter error again... 
still worried about that :frowning:





---
[Visit Topic](https://discuss.mxnet.io/t/an-illegal-memory-access/6461/1) or 
reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.io/email/unsubscribe/c6cfb36eec3e0673a9c5007fd1e32510ca31bd6f1db6cf0280979ba588b3eeb8).

Reply via email to