xinyu-intel commented on issue #18014: enabling mkldnn leads to segfault in 
bytePS
URL: 
https://github.com/apache/incubator-mxnet/issues/18014#issuecomment-612375492
 
 
   Build latest MXNet w/o MKLDNN also encounter this issue:
   ```
   cmake -DCMAKE_BUILD_TYPE=Debug -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_CUDA=ON 
-DUSE_MKLDNN=OFF -G Ninja ..
   ```
   ```
   Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
   __GI___pthread_mutex_lock (mutex=0x3a6772617f) at 
../nptl/pthread_mutex_lock.c:65
   65   ../nptl/pthread_mutex_lock.c: No such file or directory.
   #0  __GI___pthread_mutex_lock (mutex=0x3a6772617f) at 
../nptl/pthread_mutex_lock.c:65
   #1  0x00007fa43648b65b in __gthread_mutex_lock (__mutex=0x3a6772617f) at 
/usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748
   #2  0x00007fa4364adf3a in std::mutex::lock (this=0x3a6772617f) at 
/usr/include/c++/7/bits/std_mutex.h:103
   #3  0x00007fa4364c5bf4 in std::lock_guard<std::mutex>::lock_guard 
(this=0x7ffd3340e270, __m=...) at /usr/include/c++/7/bits/std_mutex.h:162
   #4  0x00007fa4366b2d6e in mxnet::engine::ThreadedVar::AppendWriteDependency 
(this=0x3a6772615f, opr_block=0x2f08190) at ../src/engine/threaded_engine.cc:74
   #5  0x00007fa4366af4f7 in mxnet::engine::ThreadedEngine::Push 
(this=0x2f053a0, op=0x2f06630, exec_ctx=..., priority=0, profiling=false) at 
../src/engine/threaded_engine.cc:311
   #6  0x00007fa4366af924 in 
mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, 
mxnet::engine::CallbackOnComplete)>, mxnet::Context, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
mxnet::FnProperty, int, char const*, bool) (this=0x2f053a0, fn=..., 
exec_ctx=..., const_vars=std::vector of length 0, capacity 0, 
mutable_vars=std::vector of length 1, capacity 1 = {...}, 
prop=mxnet::FnProperty::kCPUPrioritized, priority=0, opr_name=0x7fa2b16372ec 
"BytePSPushPull", wait=false) at ../src/engine/threaded_engine.cc:343
   #7  0x00007fa4364a72f6 in MXEnginePushAsync (async_func=0x7fa2b15659f0 
<byteps::mxnet::DoPushPull(void*, void*, void*)>, func_param=0x6ee84170, 
deleter=0x7fa2b1565040 <byteps::mxnet::(anonymous 
namespace)::DeletePushPullParam(void*)>, ctx_handle=0x7fa2b7ff8a40 
<byteps::mxnet::(anonymous namespace)::MX_EXEC_CTX>, const_vars_handle=0x0, 
num_const_vars=0, mutable_vars_handle=0x7ffd3340e8a8, num_mutable_vars=1, 
prop_handle=0x7fa2b1637380 <byteps::mxnet::(anonymous 
namespace)::MX_FUNC_PROP>, priority=0, opr_name=0x7fa2b16372ec 
"BytePSPushPull", wait=false) at ../src/c_api/c_api.cc:2665
   #8  0x00007fa2b156579d in byteps::mxnet::byteps_mxnet_push_pull_async 
(tensor=0x6d41f620, name=<optimized out>, version=0, priority=0, 
is_average=<optimized out>) at byteps/mxnet/ops.cc:116
   #9  0x00007fa50630fdae in ffi_call_unix64 () from 
/usr/lib/x86_64-linux-gnu/libffi.so.6
   #10 0x00007fa50630f71f in ffi_call () from 
/usr/lib/x86_64-linux-gnu/libffi.so.6
   #11 0x00007fa5065235c4 in _ctypes_callproc () from 
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   #12 0x00007fa506523c33 in ?? () from 
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to