leezu edited a comment on issue #14979: [BUG] Using a package with MKL and GPU 
versions, using python to open a new process will cause an error
URL: 
https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-562926756
 
 
   There are currently two hypotheses about the root cause of this error 
(https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-525103793):
 a) bug in llvm / intel openmp b) interaction between gomp and llvm / intel 
openmp.
   
   I did some more investigation and conclude we can rule out option b. In 
particular, I compile `CC=clang-8 CXX=clang++-8 cmake -DUSE_CUDA=1 
-DUSE_MKLDNN=1 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 
-DUSE_OPENCV=0 ..`. 
   
   We can investigate the shared library dependencies of the resulting 
`libmxnet.so`:
   
   ```
   % readelf -Wa libmxnet.so | grep NEEDED
    0x0000000000000001 (NEEDED)             Shared library: [libnvToolsExt.so.1]
    0x0000000000000001 (NEEDED)             Shared library: [libopenblas.so.0]
    0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
    0x0000000000000001 (NEEDED)             Shared library: [libjemalloc.so.1]
    0x0000000000000001 (NEEDED)             Shared library: [liblapack.so.3]
    0x0000000000000001 (NEEDED)             Shared library: [libcublas.so.10.0]
    0x0000000000000001 (NEEDED)             Shared library: [libcufft.so.10.0]
    0x0000000000000001 (NEEDED)             Shared library: 
[libcusolver.so.10.0]
    0x0000000000000001 (NEEDED)             Shared library: [libcurand.so.10.0]
    0x0000000000000001 (NEEDED)             Shared library: [libnvrtc.so.10.0]
    0x0000000000000001 (NEEDED)             Shared library: [libcuda.so.1]
    0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
    0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
    0x0000000000000001 (NEEDED)             Shared library: [libomp.so.5]
    0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
    0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
    0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
    0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
    0x0000000000000001 (NEEDED)             Shared library: 
[ld-linux-x86-64.so.2]
   ```
   
   among those, `libopenblas.so.0` is provided by the system and depends on 
`libgomp.so`. (If we would compile with OpenCV, OpenCV would also transitively 
depend on `ligomp.so`, so I just disable it for the purpose of this test). We 
can see it shows up among the transitive shared library dependencies:
   
   ```
   % ldd libmxnet.so
           linux-vdso.so.1 (0x00007ffd382ca000)
           libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 
(0x00007efdc9594000)
           libopenblas.so.0 => /usr/local/lib/libopenblas.so.0 
(0x00007efdc85fb000)
           librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efdc83f3000)
           libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
(0x00007efdc81bd000)
           liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 
(0x00007efdc78fe000)
           libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 
(0x00007efdc3368000)
           libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 
(0x00007efdbceb4000)
           libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 
(0x00007efdb47cd000)
           libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 
(0x00007efdb0666000)
           libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 
(0x00007efdaf04a000)
           libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 
(0x00007efdaded3000)
           libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efdadccf000)
           libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007efdadab0000)
           libomp.so.5 => /usr/lib/x86_64-linux-gnu/libomp.so.5 
(0x00007efe411b4000)
           libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x00007efdad727000)
           libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efdad389000)
           libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
(0x00007efdad171000)
           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efdacd80000)
           /lib64/ld-linux-x86-64.so.2 (0x00007efe410a8000)
           libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 
(0x00007efdac9a1000)
           libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 
(0x00007efdac772000)
           libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 
(0x00007efdac1b0000)
           libnvidia-fatbinaryloader.so.418.87.01 => 
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.01 
(0x00007efdabf62000)
           libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 
(0x00007efdabd22000)
   
   ```
   
   Thus I recompile OpenBLAS with clang. Then we can investigate the transitive 
dependencies while replacing the system OpenBLAS with the llvm-openmp based 
OpenBLAS:
   
   ```
   % LD_PRELOAD=/home/ubuntu/src/OpenBLAS/libopenblas.so ldd libmxnet.so
           linux-vdso.so.1 (0x00007ffd8eac5000)
           /home/ubuntu/src/OpenBLAS/libopenblas.so (0x00007f06ee33a000)
           libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 
(0x00007f06ee131000)
           librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f06edf29000)
           libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
(0x00007f06edcf3000)
           liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 
(0x00007f06ed434000)
           libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 
(0x00007f06e8e9e000)
           libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 
(0x00007f06e29ea000)
           libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 
(0x00007f06da303000)
           libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 
(0x00007f06d619c000)
           libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 
(0x00007f06d4b80000)
           libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 
(0x00007f06d3a09000)
           libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f06d3805000)
           libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007f06d35e6000)
           libomp.so.5 => /usr/lib/x86_64-linux-gnu/libomp.so.5 
(0x00007f0766c79000)
           libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x00007f06d325d000)
           libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f06d2ebf000)
           libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
(0x00007f06d2ca7000)
           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f06d28b6000)
           /lib64/ld-linux-x86-64.so.2 (0x00007f0766b6d000)
           libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 
(0x00007f06d24d7000)
           libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 
(0x00007f06d1f15000)
           libnvidia-fatbinaryloader.so.418.87.01 => 
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.01 
(0x00007f06d1cc7000)
           libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 
(0x00007f06d1a87000)
   ```
   
   and you find that `libmxnet.so` doesn't depend on `libgomp.so` anymore.
   
   So let's see if the test case by @fierceX still crashes:
   
   
   ```
   LD_PRELOAD=/home/ubuntu/src/OpenBLAS/libopenblas.so python3 ~/test.py
   
   Stack trace:
     [bt] (0) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(+0x186faeb) 
[0x7f653ffcfaeb]
     [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f65cf785f20]
     [bt] (2) /usr/lib/x86_64-linux-gnu/libomp.so.5(+0x3d594) [0x7f65cd145594]
     [bt] (3) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::OpenMP::set_reserve_cores(int)+0xf5)
 [0x7f653fed5255]
     [bt] (4) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)::{lambda()#2}::operator()() const+0x42) [0x7f653fee8752]
     [bt] (5) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(std::shared_ptr<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>
 > 
mxnet::common::LazyAllocArray<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>
 
>::Get<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)::{lambda()#2}>(int, 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#2})+0x487) [0x7f653fee5b87]
     [bt] (6) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)+0x223) [0x7f653fee12f3]
     [bt] (7) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*,
 mxnet::Context, int, bool)+0x1dc) [0x7f653fed625c]
     [bt] (8) 
/home/ubuntu/src/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void
 (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
mxnet::FnProperty, int, char const*, bool)+0x212) [0x7f653fed64d2]
   ```
   
   As the crash remains, we can conclude this is due to a bug in `libomp.so`, 
ie. the llvm openmp.
   
   As @fierceX's use-case is common and important among the MXNet users, we can 
thus conclude that we must not default to llvm openmp until this issue is fixed.
   
   On a sidenote, using forking in a multithreaded environment is according to 
the POSIX standard generally largely undefined (you're only allowed to `exec` 
afterwards). So it's not really a bug in llvm-openmp (as it's behavior is 
undefined). However, as it is an important use-case, and as it works with 
`gomp`, I suggest we just use `gomp`. You can also take a look at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 for some more background.
   
   @cjolivier01 please let me know if you see any issue with this investigation.
   
   PS: To compile with clang, a small change to `dmlc-core` is required 
https://github.com/dmlc/dmlc-core/compare/master...leezu:omp

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to