eric-haibin-lin edited a comment on issue #18014: enabling mkldnn leads to segfault in bytePS URL: https://github.com/apache/incubator-mxnet/issues/18014#issuecomment-612137404 With @leezu 's help I built mxnet in debug mode with commit 2f6cdd383abbf46a37b84a5fad013726b5c62169. Here's the stacktrace with line number: ``` Thread 1 "python3" received signal SIGSEGV, Segmentation fault. __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65 65 ../nptl/pthread_mutex_lock.c: No such file or directory. #0 __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65 #1 0x00007fff5314d857 in __gthread_mutex_lock (__mutex=0x20) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748 #2 0x00007fff5316b1c6 in std::mutex::lock (this=0x20) at /usr/include/c++/7/bits/std_mutex.h:103 #3 0x00007fff53182d14 in std::lock_guard<std::mutex>::lock_guard (this=0x7fffffffcb30, __m=...) at /usr/include/c++/7/bits/std_mutex.h:162 #4 0x00007fff53359852 in mxnet::engine::ThreadedVar::AppendWriteDependency (this=0x0, opr_block=0x294c2a8) at ../src/engine/threaded_engine.cc:74 #5 0x00007fff53355fd2 in mxnet::engine::ThreadedEngine::Push (this=0x2946590, op=0x294a708, exec_ctx=..., priority=0, profiling=false) at ../src/engine/threaded_engine.cc:311 #6 0x00007fff53356400 in mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool) (this=0x2946590, fn=..., exec_ctx=..., const_vars=std::vector of length 0, capacity 0, mutable_vars=std::vector of length 1, capacity 1 = {...}, prop=mxnet::FnProperty::kCPUPrioritized, priority=0, opr_name=0x7ffe356374ac "BytePSPushPull", wait=false) at ../src/engine/threaded_engine.cc:343 #7 0x00007fff53164592 in MXEnginePushAsync (async_func=0x7ffe355663e0 <byteps::mxnet::DoPushPull(void*, void*, void*)>, func_param=0x40f29520, deleter=0x7ffe35565a30 <byteps::mxnet::(anonymous namespace)::DeletePushPullParam(void*)>, ctx_handle=0x7ffe3bff8a40 <byteps::mxnet::(anonymous namespace)::MX_EXEC_CTX>, const_vars_handle=0x0, num_const_vars=0, mutable_vars_handle=0x7fffffffd168, num_mutable_vars=1, prop_handle=0x7ffe35637540 <byteps::mxnet::(anonymous namespace)::MX_FUNC_PROP>, priority=0, opr_name=0x7ffe356374ac "BytePSPushPull", wait=false) at ../src/c_api/c_api.cc:2482 #8 0x00007ffe3556618d in byteps::mxnet::byteps_mxnet_push_pull_async (tensor=0x409bf190, name=<optimized out>, version=0, priority=0, is_average=<optimized out>) at byteps/mxnet/ops.cc:116 #9 0x00007ffff65a6dae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #10 0x00007ffff65a671f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #11 0x00007ffff67ba5c4 in _ctypes_callproc () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so #12 0x00007ffff67bac33 in ?? () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
