leezu edited a comment on issue #14979: [BUG] Using a package with MKL and GPU versions, using python to open a new process will cause an error URL: https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-562926756 There are currently two hypotheses about the root cause of this error (https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-525103793): a) bug in llvm / intel openmp b) interaction between gomp and llvm / intel openmp. I did some more investigation and conclude we can rule out option b. In particular, I compile `CC=clang-8 CXX=clang++-8 cmake -DUSE_CUDA=1 -DUSE_MKLDNN=1 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 -DUSE_OPENCV=0 ..`. We can investigate the shared library dependencies of the resulting `libmxnet.so`: ``` % readelf -Wa libmxnet.so | grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libnvToolsExt.so.1] 0x0000000000000001 (NEEDED) Shared library: [libopenblas.so.0] 0x0000000000000001 (NEEDED) Shared library: [librt.so.1] 0x0000000000000001 (NEEDED) Shared library: [libjemalloc.so.1] 0x0000000000000001 (NEEDED) Shared library: [liblapack.so.3] 0x0000000000000001 (NEEDED) Shared library: [libcublas.so.10.0] 0x0000000000000001 (NEEDED) Shared library: [libcufft.so.10.0] 0x0000000000000001 (NEEDED) Shared library: [libcusolver.so.10.0] 0x0000000000000001 (NEEDED) Shared library: [libcurand.so.10.0] 0x0000000000000001 (NEEDED) Shared library: [libnvrtc.so.10.0] 0x0000000000000001 (NEEDED) Shared library: [libcuda.so.1] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libomp.so.5] 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] ``` among those, `libopenblas.so.0` is provided by the system and depends on `libgomp.so`. (If we would compile with OpenCV, OpenCV would also transitively depend on `ligomp.so`, so I just disable it for the purpose of this test). We can see it shows up among the transitive shared library dependencies: ``` % ldd libmxnet.so linux-vdso.so.1 (0x00007ffd382ca000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007efdc9594000) libopenblas.so.0 => /usr/local/lib/libopenblas.so.0 (0x00007efdc85fb000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efdc83f3000) libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007efdc81bd000) liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007efdc78fe000) libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007efdc3368000) libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007efdbceb4000) libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x00007efdb47cd000) libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007efdb0666000) libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007efdaf04a000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007efdaded3000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efdadccf000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efdadab0000) libomp.so.5 => /usr/lib/x86_64-linux-gnu/libomp.so.5 (0x00007efe411b4000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007efdad727000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efdad389000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007efdad171000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efdacd80000) /lib64/ld-linux-x86-64.so.2 (0x00007efe410a8000) libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007efdac9a1000) libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007efdac772000) libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007efdac1b0000) libnvidia-fatbinaryloader.so.418.87.01 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.01 (0x00007efdabf62000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007efdabd22000) ``` Thus I recompile OpenBLAS with clang. Then we can investigate the transitive dependencies while replacing the system OpenBLAS with the llvm-openmp based OpenBLAS: ``` % LD_PRELOAD=/home/ubuntu/src/OpenBLAS/libopenblas.so ldd libmxnet.so linux-vdso.so.1 (0x00007ffd8eac5000) /home/ubuntu/src/OpenBLAS/libopenblas.so (0x00007f06ee33a000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f06ee131000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f06edf29000) libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f06edcf3000) liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f06ed434000) libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007f06e8e9e000) libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007f06e29ea000) libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x00007f06da303000) libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007f06d619c000) libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007f06d4b80000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f06d3a09000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f06d3805000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f06d35e6000) libomp.so.5 => /usr/lib/x86_64-linux-gnu/libomp.so.5 (0x00007f0766c79000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f06d325d000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f06d2ebf000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f06d2ca7000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f06d28b6000) /lib64/ld-linux-x86-64.so.2 (0x00007f0766b6d000) libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f06d24d7000) libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f06d1f15000) libnvidia-fatbinaryloader.so.418.87.01 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.01 (0x00007f06d1cc7000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f06d1a87000) ``` and you find that `libmxnet.so` doesn't depend on `libgomp.so` anymore. So let's see if the test case by @fierceX still crashes: ``` LD_PRELOAD=/home/ubuntu/src/OpenBLAS/libopenblas.so python3 ~/test.py Stack trace: [bt] (0) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(+0x186faeb) [0x7f653ffcfaeb] [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f65cf785f20] [bt] (2) /usr/lib/x86_64-linux-gnu/libomp.so.5(+0x3d594) [0x7f65cd145594] [bt] (3) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::OpenMP::set_reserve_cores(int)+0xf5) [0x7f653fed5255] [bt] (4) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2}::operator()() const+0x42) [0x7f653fee8752] [bt] (5) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(std::shared_ptr<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> > mxnet::common::LazyAllocArray<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> >::Get<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2}>(int, mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2})+0x487) [0x7f653fee5b87] [bt] (6) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)+0x223) [0x7f653fee12f3] [bt] (7) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool)+0x1dc) [0x7f653fed625c] [bt] (8) /home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0x212) [0x7f653fed64d2] ``` As the crash remains, we can conclude this is due to a bug in `libomp.so`, ie. the llvm openmp. As @fierceX's use-case is common and important among the MXNet users, we can thus conclude that we must not default to llvm openmp until this issue is fixed. On a sidenote, using forking in a multithreaded environment is according to the POSIX standard generally largely undefined (you're only allowed to `exec` afterwards). So it's not really a bug in llvm-openmp (as it's behavior is undefined). However, as it is an important use-case, and as it works with `gomp`, I suggest we just use `gomp`. You can also take a look at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 for some more background. @cjolivier01 please let me know if you see any issue with this investigation. PS: To compile with clang, a small change to `dmlc-core` is required https://github.com/dmlc/dmlc-core/compare/master...leezu:omp
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services