RuRo edited a comment on issue #18090:
URL: 
https://github.com/apache/incubator-mxnet/issues/18090#issuecomment-617628230


   A little further investigation in GDB with some debug symbols revealed the 
following information:
   
   Thread 1 (main thread) is in `CustomOperator::Find`. (it shows up as 
[`mxnet::op::custom::AttrParser`](https://github.com/apache/incubator-mxnet/blob/6a809aaca12ea4dd31d1c2d8121b881b25eb916b/src/operator/custom/custom.cc#L98)
 in the backtrace because of inlining). Thread 1 is [waiting for the 
`CustomOperator::mutex_`](https://github.com/apache/incubator-mxnet/blob/6a809aaca12ea4dd31d1c2d8121b881b25eb916b/src/operator/custom/custom-inl.h#L63),
 which is currently held by Thread 13.
   
   Thread 13 is in `_CallPythonObject` (python internals called from `libffi`), 
trying to call 
`register.do_register.creator.create_operator_entry.delete_entry` which is a 
callback for `CustomOp::del`. To do that, it tries to acquire the Global 
Interpreter Lock, which is currently held by Thread 1.
   
   I am really not sure, how could Thread 13 arrive at the `CustomOp` deleter, 
while holding the `CustomOperator::mutex`.
   
   <details>
   <summary>GDB logs with evidence</summary>
   
   ```c
   (gdb) run test.py
   (gdb) backtrace
   #0  0x00007ffff79a14cf in __lll_lock_wait () at /usr/lib/libpthread.so.0
   #1  0x00007ffff7999e03 in pthread_mutex_lock () at /usr/lib/libpthread.so.0
   #2  0x00007fffc69c0b4e in mxnet::op::custom::AttrParser(nnvm::NodeAttrs*) () 
at /usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #3  0x00007fffc62cc3db in MXImperativeInvokeImpl(void*, int, void**, int*, 
void***, int, char const**, char const**) () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #4  0x00007fffc62cd1da in MXImperativeInvokeEx () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #5  0x00007fff79b8f43b in __pyx_pf_5mxnet_4_cy3_7ndarray_2_imperative_invoke 
(__pyx_self=<optimized out>, __pyx_v_output_is_list=<optimized out>, 
__pyx_v_is_np_op=<optimized out>, __pyx_v_out=<optimized out>, 
__pyx_v_vals=<optimized out>, __pyx_v_keys=<optimized out>, 
__pyx_v_ndargs=<optimized out>, __pyx_v_handle=<optimized out>) at 
/usr/include/c++/9.3.0/bits/stl_vector.h:915
   #6  __pyx_pw_5mxnet_4_cy3_7ndarray_3_imperative_invoke(PyObject*, PyObject*, 
PyObject*) (__pyx_self=<optimized out>, __pyx_args=<optimized out>, 
__pyx_kwds=<optimized out>) at mxnet/cython/ndarray.cpp:5334
   #7  0x00007ffff7b59968 in cfunction_call_varargs (kwargs=0x0, 
args=(93825001767984, [<ndarray at remote 0x7ffef526bb90>], ['op_type', 
'dtype', 'order', 'subok', 'shape'], ['empty_like_fallback', 'None', 'C', 
False, None], None, True, False), func=<built-in function _imperative_invoke>) 
at Objects/call.c:742
   #8  PyCFunction_Call (func=<built-in function _imperative_invoke>, 
args=(93825001767984, [<ndarray at remote 0x7ffef526bb90>], ['op_type', 
'dtype', 'order', 'subok', 'shape'], ['empty_like_fallback', 'None', 'C', 
False, None], None, True, False), kwargs=0x0) at Objects/call.c:772
   #9  0x00007ffff7b1325f in _PyObject_MakeTpCall (callable=<built-in function 
_imperative_invoke>, args=<optimized out>, nargs=<optimized out>, 
keywords=<optimized out>) at Objects/call.c:168
   #10 0x00007ffff7b7e4a8 in _PyObject_Vectorcall (kwnames=0x0, 
nargsf=9223372036854775815, args=0x555557e486f0, callable=<optimized out>) at 
Python/ceval.c:3493
   #11 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic 
pointer>, tstate=0x55555557e5f0) at Python/ceval.c:4987
   #12 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) 
at Python/ceval.c:3500
   #13 0x00007ffff7b4bc9a in _PyEval_EvalCodeWithName (_co=<code at remote 
0x7fff78d621e0>, globals=<optimized out>, locals=<optimized out>, 
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, 
kwargs=0x7fff142e60e8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, 
kwdefs=0x0, closure=0x0, name='Custom', qualname='Custom') at 
Python/ceval.c:4298
   #14 0x00007ffff7b4c969 in _PyFunction_Vectorcall (func=<optimized out>, 
stack=0x7fff142e60e0, nargsf=<optimized out>, kwnames=<optimized out>) at 
Objects/call.c:435
   #15 0x00007ffff7b476da in _PyObject_FastCallDict (callable=<function at 
remote 0x7fff78d68eb0>, args=0x7ffef55f16f8, nargsf=<optimized out>, 
kwargs=<optimized out>) at Objects/call.c:104
   #16 0x00007ffff7bf2fbe in partial_fastcall (pto=<optimized out>, 
pto=<optimized out>, kwargs=Python Exception <class 'gdb.error'> Attempt to 
extract a component of a value that is not a struct/class/union.: 
   , nargs=1, args=<optimized out>) at ./Modules/_functoolsmodule.c:169
   #17 partial_call (pto=0x7fff17ec30c0, args=<optimized out>, 
kwargs=<optimized out>) at ./Modules/_functoolsmodule.c:224
   #18 0x00007ffff7b1325f in _PyObject_MakeTpCall (callable=<functools.partial 
at remote 0x7fff17ec30c0>, args=<optimized out>, nargs=<optimized out>, 
keywords=<optimized out>) at Objects/call.c:168
   #19 0x00007ffff7b83429 in _PyObject_Vectorcall (kwnames=<optimized out>, 
nargsf=<optimized out>, args=0x555557e4ce90, callable=<functools.partial at 
remote 0x7fff17ec30c0>) at Python/ceval.c:1555
   #20 call_function (kwnames=<optimized out>, oparg=<optimized out>, 
pp_stack=<synthetic pointer>, tstate=0x55555557e5f0) at Python/ceval.c:4987
   #21 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) 
at Python/ceval.c:3515
   #22 0x00007ffff7b4bc9a in _PyEval_EvalCodeWithName (_co=<code at remote 
0x7fff78f97520>, globals=<optimized out>, locals=<optimized out>, 
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, 
kwargs=0x7fff17ed41f0, kwcount=<optimized out>, kwstep=1, defs=0x7fff78fb2488, 
defcount=4, kwdefs=0x0, closure=0x0, name='empty_like', qualname='empty_like') 
at Python/ceval.c:4298
   #23 0x00007ffff7b4c969 in _PyFunction_Vectorcall (func=<optimized out>, 
stack=0x7fff17ed41e8, nargsf=<optimized out>, kwnames=<optimized out>) at 
Objects/call.c:435
   #24 0x00007ffff7b7ecb9 in _PyObject_Vectorcall (kwnames=('dtype', 'order', 
'subok', 'shape'), nargsf=<optimized out>, args=0x7fff17ed41e8, 
callable=<function at remote 0x7fff78fab690>) at ./Include/cpython/abstract.h:92
   #25 call_function (kwnames=('dtype', 'order', 'subok', 'shape'), 
oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555557e5f0) at 
Python/ceval.c:4987
   #26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) 
at Python/ceval.c:3515
   #27 0x00007ffff7b4bc9a in _PyEval_EvalCodeWithName (_co=<code at remote 
0x7fff78bd5ba0>, globals=<optimized out>, locals=<optimized out>, 
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, 
kwargs=0x5555555e49e8, kwcount=<optimized out>, kwstep=1, defs=0x7fff78be8a88, 
defcount=4, kwdefs=0x0, closure=0x0, name='empty_like', qualname='empty_like') 
at Python/ceval.c:4298
   #28 0x00007ffff7b4c969 in _PyFunction_Vectorcall (func=<optimized out>, 
stack=0x5555555e49e0, nargsf=<optimized out>, kwnames=<optimized out>) at 
Objects/call.c:435
   #29 0x00007ffff7b7e2ea in _PyObject_Vectorcall (kwnames=0x0, 
nargsf=9223372036854775809, args=0x5555555e49e0, callable=<function at remote 
0x7fff6be6d050>) at ./Include/cpython/abstract.h:92
   #30 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic 
pointer>, tstate=0x55555557e5f0) at Python/ceval.c:4987
   #31 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) 
at Python/ceval.c:3469
   #32 0x00007ffff7b4bc9a in _PyEval_EvalCodeWithName (_co=<code at remote 
0x7ffff73952b0>, globals=<optimized out>, locals=<optimized out>, 
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, 
kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=0, 
kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4298
   #33 0x00007ffff7be52da in PyEval_EvalCodeEx (_co=<optimized out>, 
globals=<optimized out>, locals=<optimized out>, args=<optimized out>, 
argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, 
kwdefs=0x0, closure=0x0) at Python/ceval.c:4327
   #34 0x00007ffff7be52fc in PyEval_EvalCode (co=co@entry=<code at remote 
0x7ffff73952b0>, globals=globals@entry=Python Exception <class 'gdb.error'> 
Attempt to extract a component of a value that is not a struct/class/union.: 
   , locals=locals@entry=Python Exception <class 'gdb.error'> Attempt to 
extract a component of a value that is not a struct/class/union.: 
   ) at Python/ceval.c:718
   #35 0x00007ffff7be53aa in run_eval_code_obj (co=0x7ffff73952b0, 
globals=Python Exception <class 'gdb.error'> Attempt to extract a component of 
a value that is not a struct/class/union.: 
   , locals=Python Exception <class 'gdb.error'> Attempt to extract a component 
of a value that is not a struct/class/union.: 
   ) at Python/pythonrun.c:1125
   #36 0x00007ffff7c2a134 in run_mod (mod=<optimized out>, filename=<optimized 
out>, globals=Python Exception <class 'gdb.error'> Attempt to extract a 
component of a value that is not a struct/class/union.: 
   , locals=Python Exception <class 'gdb.error'> Attempt to extract a component 
of a value that is not a struct/class/union.: 
   , flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1147
   #37 0x00007ffff7af1a95 in PyRun_FileExFlags (fp=fp@entry=0x555555621020, 
filename_str=filename_str@entry=0x7ffff742aee0 "test.py", 
start=start@entry=257, globals=globals@entry=Python Exception <class 
'gdb.error'> Attempt to extract a component of a value that is not a 
struct/class/union.: 
   , locals=locals@entry=Python Exception <class 'gdb.error'> Attempt to 
extract a component of a value that is not a struct/class/union.: 
   , closeit=closeit@entry=1, flags=0x7ffffffe0578) at Python/pythonrun.c:1063
   #38 0x00007ffff7af4600 in PyRun_SimpleFileExFlags 
(fp=fp@entry=0x555555621020, filename=<optimized out>, closeit=closeit@entry=1, 
flags=flags@entry=0x7ffffffe0578) at Python/pythonrun.c:428
   #39 0x00007ffff7af4995 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555621020, 
filename=0x0, closeit=closeit@entry=1, flags=flags@entry=0x7ffffffe0578) at 
Python/pythonrun.c:86
   #40 0x00007ffff7c2c796 in pymain_run_file (cf=0x7ffffffe0578, 
config=0x55555557d8e0) at Modules/main.c:381
   #41 pymain_run_python (exitcode=0x7ffffffe0570) at Modules/main.c:565
   #42 Py_RunMain () at Modules/main.c:644
   #43 0x00007ffff7c2c889 in Py_BytesMain (argc=<optimized out>, 
argv=<optimized out>) at Modules/main.c:698
   #44 0x00007ffff7dbd023 in __libc_start_main () at /usr/lib/libc.so.6
   #45 0x000055555555505e in _start ()
   (gdb) print (**(pthread_mutex_t **)($sp+0x10)).__data.__owner
   2252813
   (gdb) info thread
     Id   Target Id                                    Frame 
     ...
     13   Thread 0x7ffef77fe700 (LWP 2252813) "python" 0x00007ffff799e01a in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
   (gdb) thread 13
   (gdb) backtrace
   #0  0x00007ffff799e01a in pthread_cond_timedwait@@GLIBC_2.3.2 () at 
/usr/lib/libpthread.so.0
   #1  0x00007ffff7b0a9fb in PyCOND_TIMEDWAIT (us=<optimized out>, 
mut=0x7ffff7d759b0 <_PyRuntime+1232>, cond=0x7ffff7d75980 <_PyRuntime+1184>) at 
Python/condvar.h:73
   #2  take_gil (tstate=0x7ffee80022b0, ceval=0x7ffff7d75728 <_PyRuntime+584>) 
at Python/ceval_gil.h:206
   #3  PyEval_RestoreThread (tstate=0x7ffee80022b0) at Python/ceval.c:399
   #4  0x00007ffff7a30b8d in PyGILState_Ensure () at Python/pystate.c:1298
   #5  0x00007ffff7217d4c in _CallPythonObject (pArgs=0x7ffef77fdc50, 
flags=<optimized out>, converters=(<_ctypes.PyCSimpleType at remote 
0x5555556405f0>,), callable=<function at remote 0x7ffef568e050>, 
setfunc=<optimized out>, restype=0x7ffff725b638, mem=0x7ffef77fddd0) at 
/home/custompkgs/PKGBUILDS/python-dbg/src/Python-3.8.2/Modules/_ctypes/callbacks.c:145
   #6  closure_fcn (cif=<optimized out>, resp=0x7ffef77fddd0, 
args=0x7ffef77fdc50, userdata=<optimized out>) at 
/home/custompkgs/PKGBUILDS/python-dbg/src/Python-3.8.2/Modules/_ctypes/callbacks.c:297
   #7  0x00007ffff71938c2 in  () at /usr/lib/libffi.so.7
   #8  0x00007ffff7193c20 in  () at /usr/lib/libffi.so.7
   #9  0x00007fffc69ba782 in  () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #10 0x00007fffc624b0b7 in 
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #11 0x00007fffc69bb33e in  () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #12 0x00007fffc69bfeca in  () at 
/usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #13 0x00007fffc62c1794 in 
std::thread::_State_impl<std::thread::_Invoker<std::tuple<mxnet::op::custom::CustomOperator::SetNumThreads(int)::{lambda()#1}>
 > >::_M_run() () at /usr/lib/python3.8/site-packages/mxnet/libmxnet.so
   #14 0x00007fff7df19b24 in std::execute_native_thread_routine(void*) 
(__p=0x555557d8b890) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
   #15 0x00007ffff799746f in start_thread () at /usr/lib/libpthread.so.0
   #16 0x00007ffff7e953d3 in clone () at /usr/lib/libc.so.6
   (gdb) print *(PyFunctionObject *)0x7ffef568e050
   {
     ...
     func_doc = 'C Callback for CustomOp::del',
     func_name = 'delete_entry',
     func_dict = 0x0,
     func_weakreflist = 0x0,
     func_module = 'mxnet.operator',
     func_annotations = 0x0,
     func_qualname = 
'register.<locals>.do_register.<locals>.creator.<locals>.create_operator_entry.<locals>.delete_entry',
     vectorcall = 0x7ffff7b4c750 <_PyFunction_Vectorcall>
   }
   (gdb) print/x ((PyThreadState*)_PyRuntime.ceval.gil.last_holder)->thread_id
   0x7ffff7839740
   (gdb) info thread
     Id   Target Id                                    Frame 
     1    Thread 0x7ffff7839740 (LWP 2252671) "python" 0x00007ffff79a14cf in 
__lll_lock_wait () from /usr/lib/libpthread.so.0
     ...
   ```
   
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to