RuRo edited a comment on issue #18090:
URL:
https://github.com/apache/incubator-mxnet/issues/18090#issuecomment-617373177
Are you sure, that the backtrace you got after interrupting is in any way
related to this issue? It doesn't seem likely to me.
1) The backtrace you got is in `_ctypes/ndarray.py`. The issue I was able to
reproduce is in the `cython` bindings. If I understand correctly, when `cython`
is used `_ctypes` isn't even imported. Instead, `_cy3.ndarray` would be used. I
mentioned, that I also sometimes got `Segmentation Fault:11` after interrupting
the script, but I don't get the `Check failed: ndim() >= 0 (-1 vs. 0)` error.
2) I don't see anything too incriminating in that backtrace? In my
experience mxnet (or any other package with C bindings) often spews random
errors, if interrupted. For example, you can also get <details><summary>this
backtrace</summary>
```
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 237, in 'calling callback function'
File "/usr/lib/python3.8/site-packages/mxnet/operator.py", line 973,
in declare_backward_dependency_entry
rdeps = cast(c_array_buf(c_int, array('i', rdeps)), c_int_p)
File "/usr/lib/python3.8/site-packages/mxnet/base.py", line 474, in
c_array_buf
return (ctype * len(buf)).from_buffer(buf)
KeyboardInterrupt
Traceback (most recent call last):
File "test.py", line 6, in <module>
mx_x = mx.np.empty_like(mx_data)
File "/usr/lib/python3.8/site-packages/mxnet/numpy/multiarray.py",
line 2582, in empty_like
return _mx_nd_np.empty_like(prototype, dtype=dtype, order=order,
subok=subok, shape=shape)
File "/usr/lib/python3.8/site-packages/mxnet/ndarray/numpy/_op.py",
line 512, in empty_like
return _npi.empty_like_fallback(prototype, dtype=dtype, order=order,
subok=subok, shape=shape)
File "<string>", line 40, in Custom
File "mxnet/cython/ndarray.pyx", line 219, in
mxnet._cy3.ndarray._imperative_invoke
File "mxnet/cython/./base.pyi", line 41, in mxnet._cy3.ndarray.CALL
mxnet.base.MXNetError: Traceback (most recent call last):
File
"/home/custompkgs/PKGBUILDS/mxnet-ruro-git/src/mxnet-ruro-git/src/operator/custom/custom.cc",
line 121
MXNetError: Check failed: reinterpret_cast<CustomOpBwdDepFunc>(
params.info->callbacks[kCustomOpPropDeclareBackwardDependency])(
out_grad.data(), in_data.data(), out_data.data(), &num_dep, &rdeps,
params.info->contexts[kCustomOpPropDeclareBackwardDependency]):
```
</details> or <details><summary>this backtrace with a segfault</summary>
```
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 237, in 'calling callback function'
File "/usr/lib/python3.8/site-packages/mxnet/operator.py", line 935,
in list_arguments_entry
try:
KeyboardInterrupt
Segmentation fault: 11
terminate called without an active exception
[1] 1437249 abort (core dumped) python test.py
```
</details>.
3) I grepped around a little bit and it *seems* to me that every call to
`cffi`/`cython` is properly wrapped with `check_call` or the equivalent and I
don't see any suspicious `try/except` clauses in the python part of `CustomOp`
implementation.
Also, I don't quite understand, even if there are unhandled errors somewhere
in `CustomOp`, I still don't see, how would that cause a deadlock?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]