[GitHub] idealboy opened a new issue #13075: Performence with multi thead inference is slow

GitBox Wed, 31 Oct 2018 23:34:11 -0700

idealboy opened a new issue #13075: Performence with multi thead inference is 
slow
URL: https://github.com/apache/incubator-mxnet/issues/13075
 
 
   Note: Providing complete information in the most concise form is the best 
way to get help. This issue template serves as the checklist for essential 
information to most of the technical issues and bug reports. For non-technical 
issues and feature requests, feel free to present the information in what you 
believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at 
https://discuss.mxnet.io 
   
   ## Description
   (Brief description of the problem in no more than 2 sentences.)
   when I do inference with multi thread(each thread will create one predictor 
handle with the same libmxnet.so), I found it is very slow.
   
   I use MXPredReshape in some code to adapt for different input shape
   
   ## Environment info (Required)
   
   ```
   What to do:
   1. Download the diagnosis script from 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
   2. Run the script using `python diagnose.py` and paste its output here.
   
   ```
   ----------Python Info----------
   ('Version      :', '2.7.5')
   ('Compiler     :', 'GCC 4.8.2 20140120 (Red Hat 4.8.2-16)')
   ('Build        :', ('default', 'Jun 17 2014 18:11:42'))
   ('Arch         :', ('64bit', 'ELF'))
   ------------Pip Info-----------
   ('Version      :', '9.0.1')
   ('Directory    :', '/usr/lib/python2.7/site-packages/pip')
   ----------MXNet Info-----------
   No MXNet installed.
   ----------System Info----------
   ('Platform     :', 
'Linux-4.1.5-1.el7.centos.x86_64-x86_64-with-centos-7.0.1406-Core')
   ('system       :', 'Linux')
   ('node         :', 'face00')
   ('release      :', '4.1.5-1.el7.centos.x86_64')
   ('version      :', '#1 SMP Tue Aug 11 13:53:50 EDT 2015')
   ----------Hardware Info----------
   ('machine      :', 'x86_64')
   ('processor    :', 'x86_64')
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                16
   On-line CPU(s) list:   0-15
   Thread(s) per core:    1
   Core(s) per socket:    1
   Socket(s):             16
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 22
   Model name:           
   Stepping:              3
   CPU MHz:               2494.224
   BogoMIPS:              4988.44
   Hypervisor vendor:     KVM
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              256K
   L3 cache:              30720K
   NUMA node0 CPU(s):     0-15
   
   Package used (Python/R/Scala/Julia):
   (I'm using Python)
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio):
   gcc4.8.5
   
   MXNet commit hash:
   (Paste the output of `git rev-parse HEAD` here.)
   0a286a002c6f3c98843389fedfedf89d97324fda
   
   mxnet1.3.x
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   
    40 export CC = gcc
    41 export CXX = g++
    42 export NVCC = nvcc
   44 # whether compile with options for MXNet developer
    45 DEV = 0
    46 
    47 # whether compile with debug
    48 DEBUG = 0
    49 
    50 # whether to turn on segfault signal handler to log the stack trace
    51 USE_SIGNAL_HANDLER =
    52 
    53 # the additional link flags you want to add
    54 ADD_LDFLAGS = -L/usr/local/lib
    55 
    56 # the additional compile flags you want to add
    57 ADD_CFLAGS = -I/usr/local/include
    64 USE_CUDA = 0
    65 
    66 # add the path to CUDA library to link and compile flag
    67 # if you have already add them to environment variable, leave it as NONE
    68 USE_CUDA_PATH = /usr/local/cuda-9.1
    69 # USE_CUDA_PATH = /usr/local/cuda
    70 
    71 # whether to enable CUDA runtime compilation
    72 ENABLE_CUDA_RTC = 1
    93 USE_OPENMP = 1
    94 
    95 # whether use MKL-DNN library
    96 USE_MKLDNN = 0
    97 
    98 # whether use NNPACK library
    99 USE_NNPACK = 0
   100 
   101 # choose the version of blas you want to use
   102 # can be: mkl, blas, atlas, openblas
   103 # in default use atlas for linux while apple for osx
   104 UNAME_S := $(shell uname -s)
   105 ifeq ($(UNAME_S), Darwin)
   106 USE_BLAS = apple
   107 else
   108 USE_BLAS = openblas
   109 endif
   110 
   111 # whether use lapack during compilation
   112 # only effective when compiled with blas versions 
openblas/apple/atlas/mkl
   113 USE_LAPACK = 0
   114 
   115 # path to lapack library in case of a non-standard installation
   116 USE_LAPACK_PATH =
   117 
   118 # add path to intel library, you may need it for MKL, if you did not add 
the path
   119 # to environment variable
   120 USE_INTEL_PATH = NONE
   121 
   122 # If use MKL only for BLAS, choose static link automatically to allow 
python wrapper
   123 ifeq ($(USE_BLAS), mkl)
   124 USE_STATIC_MKL = 1
   125 else
   126 USE_STATIC_MKL = NONE
   157 USE_HDFS = 0
   158 
   159 # path to libjvm.so. required if USE_HDFS=1
   160 LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
   161 
   162 # whether or not allow to read and write AWS S3 directly. If yes, then
   163 # libcurl4-openssl-dev is required, it can be installed on Ubuntu by
   164 # sudo apt-get install -y libcurl4-openssl-dev
   165 USE_S3 = 0
   166 
   167 #----------------------------
   168 # performance settings
   169 #----------------------------
   170 # Use operator tuning
   171 USE_OPERATOR_TUNING = 1
   172 
   173 # Use gperftools if found
   174 USE_GPERFTOOLS = 0
   175 
   176 # Use JEMalloc if found, and not using gperftools
   177 USE_JEMALLOC = 1
   178 
   179 #----------------------------
   180 # additional operators
   181 #----------------------------
   182 
   183 # path to folders containing projects specific operators that you don't 
want to put in src/operators
   184 EXTRA_OPERATORS =
   185 
   186 #----------------------------
   187 # other features
   188 #----------------------------
   189 
   190 # Create C++ interface package
   191 USE_CPP_PACKAGE = 1
   
   
   ## Error Message:
   (Paste the complete error message, including stack trace.)
   
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that 
reproduces the error. Otherwise, please provide link to the existing example.)
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] idealboy opened a new issue #13075: Performence with multi thead inference is slow

Reply via email to