eric-haibin-lin edited a comment on issue #18014: enabling mkldnn leads to 
segfault in bytePS
URL: 
https://github.com/apache/incubator-mxnet/issues/18014#issuecomment-612137404
 
 
   With @leezu 's help I built mxnet in debug mode with commit 
2f6cdd383abbf46a37b84a5fad013726b5c62169. Here's the stacktrace with line 
number: 
   ```
   Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
   __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65
   65      ../nptl/pthread_mutex_lock.c: No such file or directory.
   #0  __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65
   #1  0x00007fff5314d857 in __gthread_mutex_lock (__mutex=0x20) at 
/usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748
   #2  0x00007fff5316b1c6 in std::mutex::lock (this=0x20) at 
/usr/include/c++/7/bits/std_mutex.h:103
   #3  0x00007fff53182d14 in std::lock_guard<std::mutex>::lock_guard 
(this=0x7fffffffcb30, __m=...) at /usr/include/c++/7/bits/std_mutex.h:162
   #4  0x00007fff53359852 in mxnet::engine::ThreadedVar::AppendWriteDependency 
(this=0x0, opr_block=0x294c2a8) at ../src/engine/threaded_engine.cc:74
   #5  0x00007fff53355fd2 in mxnet::engine::ThreadedEngine::Push 
(this=0x2946590, op=0x294a708, exec_ctx=..., priority=0, profiling=false) at 
../src/engine/threaded_engine.cc:311
   #6  0x00007fff53356400 in 
mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, 
mxnet::engine::CallbackOnComplete)>, mxnet::Context, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
mxnet::FnProperty, int, char const*, bool) (this=0x2946590, fn=..., 
exec_ctx=..., const_vars=std::vector of length 0, capacity 0, 
mutable_vars=std::vector of length 1, capacity 1 = {...}, 
prop=mxnet::FnProperty::kCPUPrioritized, priority=0, opr_name=0x7ffe356374ac 
"BytePSPushPull", wait=false) at ../src/engine/threaded_engine.cc:343
   #7  0x00007fff53164592 in MXEnginePushAsync (async_func=0x7ffe355663e0 
<byteps::mxnet::DoPushPull(void*, void*, void*)>, func_param=0x40f29520, 
deleter=0x7ffe35565a30 <byteps::mxnet::(anonymous 
namespace)::DeletePushPullParam(void*)>, ctx_handle=0x7ffe3bff8a40 
<byteps::mxnet::(anonymous namespace)::MX_EXEC_CTX>, const_vars_handle=0x0, 
num_const_vars=0, mutable_vars_handle=0x7fffffffd168, num_mutable_vars=1, 
prop_handle=0x7ffe35637540 <byteps::mxnet::(anonymous 
namespace)::MX_FUNC_PROP>, priority=0, opr_name=0x7ffe356374ac 
"BytePSPushPull", wait=false) at ../src/c_api/c_api.cc:2482
   #8  0x00007ffe3556618d in byteps::mxnet::byteps_mxnet_push_pull_async 
(tensor=0x409bf190, name=<optimized out>, version=0, priority=0, 
is_average=<optimized out>) at byteps/mxnet/ops.cc:116
   #9  0x00007ffff65a6dae in ffi_call_unix64 () from 
/usr/lib/x86_64-linux-gnu/libffi.so.6
   #10 0x00007ffff65a671f in ffi_call () from 
/usr/lib/x86_64-linux-gnu/libffi.so.6
   #11 0x00007ffff67ba5c4 in _ctypes_callproc () from 
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   #12 0x00007ffff67bac33 in ?? () from 
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to