eric-haibin-lin commented on issue #18014: enabling mkldnn leads to segfault in bytePS URL: https://github.com/apache/incubator-mxnet/issues/18014#issuecomment-612126604 I also tried to build both mxnet and bytePS from source, with gcc 7.4, and the following is the stacktrace: ``` BytePS launching worker warning: Error disabling address space randomization: Operation not permitted -------------------------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================================ Input (1, 1, 28, 28) 0 Activation-1 <Symbol hybridsequential0_conv0_relu_fwd> 0 Activation-2 (1, 20, 24, 24) 0 Conv2D-3 (1, 20, 24, 24) 520 MaxPool2D-4 (1, 20, 12, 12) 0 Activation-5 <Symbol hybridsequential0_conv1_relu_fwd> 0 Activation-6 (1, 50, 8, 8) 0 Conv2D-7 (1, 50, 8, 8) 25050 MaxPool2D-8 (1, 50, 4, 4) 0 Flatten-9 (1, 800) 0 Activation-10 <Symbol hybridsequential0_dense0_relu_fwd> 0 Activation-11 (1, 512) 0 Dense-12 (1, 512) 410112 Dense-13 (1, 10) 5130 ================================================================================ Parameters in forward computation graph, duplicate included Total params: 440812 Trainable params: 440812 Non-trainable params: 0 Shared params in forward computation graph: 0 Unique parameters in model: 440812 -------------------------------------------------------------------------------- Thread 1 "python3" received signal SIGSEGV, Segmentation fault. __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65 65 ../nptl/pthread_mutex_lock.c: No such file or directory. #0 __GI___pthread_mutex_lock (mutex=0x20) at ../nptl/pthread_mutex_lock.c:65 #1 0x00007f283d52f170 in mxnet::engine::ThreadedVar::AppendWriteDependency(mxnet::engine::OprBlock*) () from /mxnet/python/mxnet/../../build/libmxnet.so #2 0x00007f283d52ad3f in mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool) () from /mxnet/python/mxnet/../../build/libmxnet.so #3 0x00007f283d527b85 in mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool) () from /mxnet/python/mxnet/../../build/libmxnet.so #4 0x00007f283d3dbcb1 in MXEnginePushAsync () from /mxnet/python/mxnet/../../build/libmxnet.so #5 0x00007f271d56618d in byteps::mxnet::byteps_mxnet_push_pull_async (tensor=0x45797570, name=<optimized out>, version=0, priority=0, is_average=<optimized out>) at byteps/mxnet/ops.cc:116 #6 0x00007f28f290edae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #7 0x00007f28f290e71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #8 0x00007f28f2b225c4 in _ctypes_callproc () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so #9 0x00007f28f2b22c33 in ?? () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so ``` setting `MXNET_MKLDNN_ENABLED=0` did not help.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
