RuRo edited a comment on issue #18090:
URL: 
https://github.com/apache/incubator-mxnet/issues/18090#issuecomment-617373177


   Are you sure, that the backtrace you got after interrupting is in any way 
related to this issue? It doesn't seem likely to me.
   1) The backtrace you got is in `_ctypes/ndarray.py`. The issue I was able to 
reproduce is in the `cython` bindings. If I understand correctly, when `cython` 
is used `_ctypes` isn't even imported. Instead, `_cy3.ndarray` would be used. I 
mentioned, that I also sometimes got `Segmentation Fault:11` after interrupting 
the script, but I don't get the `Check failed: ndim() >= 0 (-1 vs. 0)` error.
   2) I don't see anything too incriminating in that backtrace? In my 
experience mxnet (or any other package with C bindings) often spews random 
errors, if interrupted. For example, you can also get <details><summary>this 
backtrace</summary>
   
       ```
       Traceback (most recent call last):
         File "_ctypes/callbacks.c", line 237, in 'calling callback function'
         File "/usr/lib/python3.8/site-packages/mxnet/operator.py", line 973, 
in declare_backward_dependency_entry
           rdeps = cast(c_array_buf(c_int, array('i', rdeps)), c_int_p)
         File "/usr/lib/python3.8/site-packages/mxnet/base.py", line 474, in 
c_array_buf
           return (ctype * len(buf)).from_buffer(buf)
       KeyboardInterrupt
       Traceback (most recent call last):
         File "test.py", line 6, in <module>
           mx_x = mx.np.empty_like(mx_data)
         File "/usr/lib/python3.8/site-packages/mxnet/numpy/multiarray.py", 
line 2582, in empty_like
           return _mx_nd_np.empty_like(prototype, dtype=dtype, order=order, 
subok=subok, shape=shape)
         File "/usr/lib/python3.8/site-packages/mxnet/ndarray/numpy/_op.py", 
line 512, in empty_like
           return _npi.empty_like_fallback(prototype, dtype=dtype, order=order, 
subok=subok, shape=shape)
         File "<string>", line 40, in Custom
         File "mxnet/cython/ndarray.pyx", line 219, in 
mxnet._cy3.ndarray._imperative_invoke
         File "mxnet/cython/./base.pyi", line 41, in mxnet._cy3.ndarray.CALL
       mxnet.base.MXNetError: Traceback (most recent call last):
         File 
"/home/custompkgs/PKGBUILDS/mxnet-ruro-git/src/mxnet-ruro-git/src/operator/custom/custom.cc",
 line 121
       MXNetError: Check failed: reinterpret_cast<CustomOpBwdDepFunc>( 
params.info->callbacks[kCustomOpPropDeclareBackwardDependency])( 
out_grad.data(), in_data.data(), out_data.data(), &num_dep, &rdeps, 
params.info->contexts[kCustomOpPropDeclareBackwardDependency]): 
       ```
       
       </details> or <details><summary>this backtrace with a segfault</summary>
       
       ```
           Traceback (most recent call last):
         File "_ctypes/callbacks.c", line 237, in 'calling callback function'
         File "/usr/lib/python3.8/site-packages/mxnet/operator.py", line 935, 
in list_arguments_entry
           try:
       KeyboardInterrupt
       
       Segmentation fault: 11
       
       terminate called without an active exception
       [1]    1437249 abort (core dumped)  python test.py
       ```
       
       </details>.
   3) I grepped around a little bit and it *seems* to me that every call to 
`cffi`/`cython` is properly wrapped with `check_call` or the equivalent and I 
don't see any suspicious `try/except` clauses in the python part of `CustomOp` 
implementation.
   
   Also, I don't quite understand, even if there are unhandled errors somewhere 
in `CustomOp`, I still don't see, how would that cause a deadlock?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to