larroy commented on issue #14979: [BUG] Using a package with MKL and GPU 
versions, using python to open a new process will cause an error
URL: 
https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-525102640
 
 
   Found this related info:
   
   
https://stackoverflow.com/questions/25986091/telling-gcc-to-not-link-libgomp-so-it-links-libiomp5-instead
   
   Added a sleep to be able to attach gdb on the "train pid"
   ```
   from multiprocessing import Process
   import gluonnlp as nlp
   import numpy as np
   from gluonnlp.data import SQuAD
   from mxnet import nd,gluon
   import mxnet as mx
   from mxnet.gluon import nn
   import os
   import time
   
   class Transform(object):
       def __init__(self):
           pass
   
       def __call__(self, record_index, question_id, question, context, 
answer_list,
                    answer_start_list):
           return np.ones((100,1)),np.ones((100,3))
   
   def train():
       print("train pid: {}".format(os.getpid()))
       print("10 9...")
       time.sleep(10)
       print("go")
       train_data = SQuAD('train')
       dataloader = 
gluon.data.DataLoader(train_data.transform(Transform()),batch_size=128, 
shuffle=True, num_workers=4)
       net = nn.HybridSequential()
       net.add(nn.Dense(10))
       net.initialize(mx.init.Xavier(), ctx=mx.gpu(0))
       print(net)
   
   print("parent pid: {}".format(os.getpid()))
   p = Process(target=train)
   p.start()
   p.join()
   
   
   ```
   
   Used this branch, to make sure only intel omp is used:
   
   https://github.com/larroy/mxnet/tree/omp_chooser
   
   piotr@ip-172-31-22-252:0: ~/mxnet [omp_chooser]> ldd build/libmxnet.so | 
grep omp
           libiomp5.so => 
/home/piotr/mxnet/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so 
(0x00007f3507f63000)
   
   Build config:
   ```
   piotr@ip-172-31-22-252:0: ~/mxnet [omp_chooser]> cat cmake_options.yml
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   
   --- # CMake configuration
   USE_CUDA: "ON" # Build with CUDA support
   USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
   USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
   USE_OPENCV: "ON" # Build with OpenCV support
   USE_OPENMP: "ON" # Build with Openmp support
   USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for 
search path
   USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
   USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects 
support if "ON"
   USE_LAPACK: "ON" # Build with lapack support
   USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
   USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF 
USE_MKL_IF_AVAILABLE AND (NOT APPLE)
   USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF 
USE_MKL_IF_AVAILABLE AND (NOT APPLE)
   USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
   USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
   USE_JEMALLOC: "ON" # Build with Jemalloc support
   USE_PROFILER: "ON" # Build with Profiler support
   USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
   USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
   USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
   USE_CPP_PACKAGE: "OFF" # Build C++ Package
   USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
   USE_GPROF: "OFF" # Compile with gprof (profiling) flag
   USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
   USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set 
VTUNE_ROOT for search path
   ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
   BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
   INSTALL_EXAMPLES: "OFF" # Install the example source files.
   USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
   USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
   USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
   ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric 
output
   CMAKE_BUILD_TYPE: "RelWithDebInfo"
   CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
   CMAKE_C_COMPILER_LAUNCHER: "ccache"
   CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
   
   ```
   
   used `./dev_menu.py build`
   
   source py3_venv/bin/activate.fish
   pip install gluonnlp
   
   ```
   (py3_venv) piotr@ip-172-31-22-252:1: ~/mxnet [omp_chooser]> python test.py
   parent pid: 31483
   train pid: 31660
   10 9...
   
   
   
   
   
   
   go
   pid: 31702
   pid: 31711
   pid: 31702
   pid: 31720
   
   Segmentation fault: 11
   
   Process id: 31660
   Stack trace:
     [bt] (0) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(+0x37efa99) 
[0x7f5c4c996a99]
     [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f5d010b1f20]
     [bt] (2) 
/home/piotr/mxnet/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so(+0xac19c)
 [0x7f5cfb7f819c]
     [bt] (3) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::OpenMP::set_reserve_cores(int)+0x81a)
 [0x7f5c4c8e7d8a]
     [bt] (4) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)::{lambda()#2}::operator()() const+0x4f) [0x7f5c4c8f62df]
     [bt] (5) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(std::shared_ptr<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>
 > 
mxnet::common::LazyAllocArray<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>
 
>::Get<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)::{lambda()#2}>(int, 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#2})+0x3c2) [0x7f5c4c8f7a02]
     [bt] (6) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
 bool)+0x481) [0x7f5c4c8f9021]
     [bt] (7) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*,
 mxnet::Context, int, bool)+0x19f) [0x7f5c4c8eb2bf]
     [bt] (8) 
/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void
 (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, 
mxnet::FnProperty, int, char const*, bool)+0x155) [0x7f5c4c8e8995]
   ```
   
   
   ```
   cgdb /home/piotr/mxnet/py3_venv/bin/python
   attach <PID printed above before sleep continues>
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to