larroy commented on issue #14979: [BUG] Using a package with MKL and GPU versions, using python to open a new process will cause an error URL: https://github.com/apache/incubator-mxnet/issues/14979#issuecomment-525102640 Found this related info: https://stackoverflow.com/questions/25986091/telling-gcc-to-not-link-libgomp-so-it-links-libiomp5-instead Added a sleep to be able to attach gdb on the "train pid" ``` from multiprocessing import Process import gluonnlp as nlp import numpy as np from gluonnlp.data import SQuAD from mxnet import nd,gluon import mxnet as mx from mxnet.gluon import nn import os import time class Transform(object): def __init__(self): pass def __call__(self, record_index, question_id, question, context, answer_list, answer_start_list): return np.ones((100,1)),np.ones((100,3)) def train(): print("train pid: {}".format(os.getpid())) print("10 9...") time.sleep(10) print("go") train_data = SQuAD('train') dataloader = gluon.data.DataLoader(train_data.transform(Transform()),batch_size=128, shuffle=True, num_workers=4) net = nn.HybridSequential() net.add(nn.Dense(10)) net.initialize(mx.init.Xavier(), ctx=mx.gpu(0)) print(net) print("parent pid: {}".format(os.getpid())) p = Process(target=train) p.start() p.join() ``` Used this branch, to make sure only intel omp is used: https://github.com/larroy/mxnet/tree/omp_chooser piotr@ip-172-31-22-252:0: ~/mxnet [omp_chooser]> ldd build/libmxnet.so | grep omp libiomp5.so => /home/piotr/mxnet/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so (0x00007f3507f63000) Build config: ``` piotr@ip-172-31-22-252:0: ~/mxnet [omp_chooser]> cat cmake_options.yml # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. --- # CMake configuration USE_CUDA: "ON" # Build with CUDA support USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda USE_NCCL: "OFF" # Use NVidia NCCL with CUDA USE_OPENCV: "ON" # Build with OpenCV support USE_OPENMP: "ON" # Build with Openmp support USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON" USE_LAPACK: "ON" # Build with lapack support USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found) USE_JEMALLOC: "ON" # Build with Jemalloc support USE_PROFILER: "ON" # Build with Profiler support USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin USE_CPP_PACKAGE: "OFF" # Build C++ Package USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions. USE_GPROF: "OFF" # Compile with gprof (profiling) flag USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support BUILD_CPP_EXAMPLES: "ON" # Build cpp examples INSTALL_EXAMPLES: "OFF" # Install the example source files. USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults. USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT. USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers. ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output CMAKE_BUILD_TYPE: "RelWithDebInfo" CMAKE_CUDA_COMPILER_LAUNCHER: "ccache" CMAKE_C_COMPILER_LAUNCHER: "ccache" CMAKE_CXX_COMPILER_LAUNCHER: "ccache" ``` used `./dev_menu.py build` source py3_venv/bin/activate.fish pip install gluonnlp ``` (py3_venv) piotr@ip-172-31-22-252:1: ~/mxnet [omp_chooser]> python test.py parent pid: 31483 train pid: 31660 10 9... go pid: 31702 pid: 31711 pid: 31702 pid: 31720 Segmentation fault: 11 Process id: 31660 Stack trace: [bt] (0) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(+0x37efa99) [0x7f5c4c996a99] [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f5d010b1f20] [bt] (2) /home/piotr/mxnet/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so(+0xac19c) [0x7f5cfb7f819c] [bt] (3) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::OpenMP::set_reserve_cores(int)+0x81a) [0x7f5c4c8e7d8a] [bt] (4) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2}::operator()() const+0x4f) [0x7f5c4c8f62df] [bt] (5) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(std::shared_ptr<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> > mxnet::common::LazyAllocArray<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> >::Get<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2}>(int, mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#2})+0x3c2) [0x7f5c4c8f7a02] [bt] (6) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)+0x481) [0x7f5c4c8f9021] [bt] (7) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool)+0x19f) [0x7f5c4c8eb2bf] [bt] (8) /home/piotr/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0x155) [0x7f5c4c8e8995] ``` ``` cgdb /home/piotr/mxnet/py3_venv/bin/python attach <PID printed above before sleep continues> ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
