[GitHub] [incubator-mxnet] eloi-loomai opened a new issue #15745: Memory layout in the LSTM operator

GitBox Sat, 03 Aug 2019 14:43:07 -0700

eloi-loomai opened a new issue #15745: Memory layout in the LSTM operator
URL: https://github.com/apache/incubator-mxnet/issues/15745
 
 
   ## Description
   Suspicious bug in the LSTM RNN operator
   
   ## Environment info (Required)
   
   ```
   ----------Python Info----------
   Version      : 3.7.2
   Compiler     : Clang 4.0.1 (tags/RELEASE_401/final)
   Build        : ('default', 'Dec 29 2018 00:00:04')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.0.1
   Directory    : 
/Users/edubois/anaconda3/envs/py36/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : /Users/edubois/_DEV/3rdParties/incubator-mxnet/python/mxnet
   Commit hash file 
"/Users/edubois/_DEV/3rdParties/incubator-mxnet/python/mxnet/COMMIT_HASH" not 
found. Not installed from pre-built package or built from source.
   Library      : 
['/Users/edubois/_DEV/3rdParties/incubator-mxnet/python/mxnet/../../build/libmxnet.dylib']
   Build features:
   ✖ CUDA
   ✖ CUDNN
   ✖ NCCL
   ✖ CUDA_RTC
   ✖ TENSORRT
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✔ CPU_SSE4_1
   ✔ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✔ CPU_AVX
   ✖ CPU_AVX2
   ✖ OPENMP
   ✖ SSE
   ✔ F16C
   ✖ JEMALLOC
   ✖ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✔ BLAS_MKL
   ✖ BLAS_APPLE
   ✔ LAPACK
   ✖ MKLDNN
   ✔ OPENCV
   ✖ CAFFE
   ✖ PROFILER
   ✖ DIST_KVSTORE
   ✖ CXX14
   ✖ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------System Info----------
   Platform     : Darwin-18.2.0-x86_64-i386-64bit
   system       : Darwin
   node         : MBP-de-Eloi
   release      : 18.2.0
   version      : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; 
root:xnu-4903.241.1~1/RELEASE_X86_64
   ----------Hardware Info----------
   machine      : x86_64
   processor    : i386
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE 
MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ 
DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC 
MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE 
AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW 
RDTSCP TSCI'
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0223 
sec, LOAD: 0.7915 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1524 sec, LOAD: 
0.5313 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2115 sec, LOAD: 
0.6373 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0434 sec, LOAD: 0.3803 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0296 sec, LOAD: 
0.6332 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0392 sec, 
LOAD: 0.2061 sec.
   ```
   
   Package used (Python/R/Scala/Julia):
   C++
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio): clang
   
   MXNet commit hash:
   `24cce9e3c99e499b696b779cbb3b863145f473f1`
   
   Build config:
   ```#---------------------
   # choice of compiler
   #--------------------
   
   ifndef CC
   export CC = gcc
   endif
   ifndef CXX
   export CXX = g++
   endif
   ifndef NVCC
   export NVCC = nvcc
   endif
   
   # whether compile with options for MXNet developer
   DEV = 0
   
   # whether compile with debug
   DEBUG = 0
   
   # whether to turn on segfault signal handler to log the stack trace
   USE_SIGNAL_HANDLER =
   
   # the additional link flags you want to add
   ADD_LDFLAGS =
   
   # the additional compile flags you want to add
   ADD_CFLAGS =
   
   # whether to build operators written in TVM
   USE_TVM_OP = 0
   
   #---------------------------------------------
   # matrix computation libraries for CPU/GPU
   #---------------------------------------------
   
   # whether use CUDA during compile
   USE_CUDA = 0
   
   # add the path to CUDA library to link and compile flag
   # if you have already add them to environment variable, leave it as NONE
   # USE_CUDA_PATH = /usr/local/cuda
   USE_CUDA_PATH = NONE
   
   # whether to enable CUDA runtime compilation
   ENABLE_CUDA_RTC = 1
   
   # whether use CuDNN R3 library
   USE_CUDNN = 0
   
   # whether to use NVTX when profiling
   USE_NVTX = 0
   
   #whether to use NCCL library
   USE_NCCL = 0
   #add the path to NCCL library
   USE_NCCL_PATH = NONE
   
   # whether use opencv during compilation
   # you can disable it, however, you will not able to use
   # imbin iterator
   USE_OPENCV = 1
   # Add OpenCV include path, in which the directory `opencv2` exists
   USE_OPENCV_INC_PATH = NONE
   # Add OpenCV shared library path, in which the shared library exists
   USE_OPENCV_LIB_PATH = NONE
   
   #whether use libjpeg-turbo for image decode without OpenCV wrapper
   USE_LIBJPEG_TURBO = 0
   #add the path to libjpeg-turbo library
   USE_LIBJPEG_TURBO_PATH = NONE
   
   # use openmp for parallelization
   USE_OPENMP = 1
   
   # whether use MKL-DNN library: 0 = disabled, 1 = enabled
   # if USE_MKLDNN is not defined, MKL-DNN will be enabled by default on x86 
Linux.
   # you can disable it explicity with USE_MKLDNN = 0
   USE_MKLDNN =
   
   # whether use NNPACK library
   USE_NNPACK = 0
   
   # choose the version of blas you want to use
   # can be: mkl, blas, atlas, openblas
   # in default use atlas for linux while apple for osx
   UNAME_S := $(shell uname -s)
   ifeq ($(UNAME_S), Darwin)
   USE_BLAS = apple
   else
   USE_BLAS = atlas
   endif
   
   # whether use lapack during compilation
   # only effective when compiled with blas versions openblas/apple/atlas/mkl
   USE_LAPACK = 1
   
   # path to lapack library in case of a non-standard installation
   USE_LAPACK_PATH =
   
   # add path to intel library, you may need it for MKL, if you did not add the 
path
   # to environment variable
   USE_INTEL_PATH = NONE
   
   # If use MKL only for BLAS, choose static link automatically to allow python 
wrapper
   ifeq ($(USE_BLAS), mkl)
   USE_STATIC_MKL = 1
   else
   USE_STATIC_MKL = NONE
   endif
   
   #----------------------------
   # Settings for power and arm arch
   #----------------------------
   ARCH := $(shell uname -a)
   ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
        USE_SSE=0
        USE_F16C=0
   else
        USE_SSE=1
   endif
   
   #----------------------------
   # F16C instruction support for faster arithmetic of fp16 on CPU
   #----------------------------
   # For distributed training with fp16, this helps even if training on GPUs
   # If left empty, checks CPU support and turns it on.
   # For cross compilation, please check support for F16C on target device and 
turn off if necessary.
   USE_F16C =
   
   #----------------------------
   # distributed computing
   #----------------------------
   
   # whether or not to enable multi-machine supporting
   USE_DIST_KVSTORE = 0
   
   # whether or not allow to read and write HDFS directly. If yes, then hadoop 
is
   # required
   USE_HDFS = 0
   
   # path to libjvm.so. required if USE_HDFS=1
   LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
   
   # whether or not allow to read and write AWS S3 directly. If yes, then
   # libcurl4-openssl-dev is required, it can be installed on Ubuntu by
   # sudo apt-get install -y libcurl4-openssl-dev
   USE_S3 = 0
   
   #----------------------------
   # performance settings
   #----------------------------
   # Use operator tuning
   USE_OPERATOR_TUNING = 1
   
   # Use gperftools if found
   # Disable because of #8968
   USE_GPERFTOOLS = 0
   
   # path to gperftools (tcmalloc) library in case of a non-standard 
installation
   USE_GPERFTOOLS_PATH =
   
   # Link gperftools statically
   USE_GPERFTOOLS_STATIC =
   
   # Use JEMalloc if found, and not using gperftools
   USE_JEMALLOC = 1
   
   # path to jemalloc library in case of a non-standard installation
   USE_JEMALLOC_PATH =
   
   # Link jemalloc statically
   USE_JEMALLOC_STATIC =
   
   #----------------------------
   # additional operators
   #----------------------------
   
   # path to folders containing projects specific operators that you don't want 
to put in src/operators
   EXTRA_OPERATORS =
   
   #----------------------------
   # other features
   #----------------------------
   
   # Create C++ interface package
   USE_CPP_PACKAGE = 0
   
   # Use int64_t type to represent the total number of elements in a tensor
   # This will cause performance degradation reported in issue #14496
   # Set to 1 for large tensor with tensor size greater than INT32_MAX i.e. 
2147483647
   # Note: the size of each dimension is still bounded by INT32_MAX
   USE_INT64_TENSOR_SIZE = 0
   
   # Python executable. Needed for cython target
   PYTHON = python
   
   #----------------------------
   # plugins
   #----------------------------
   
   # whether to use caffe integration. This requires installing caffe.
   # You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
   # CAFFE_PATH = $(HOME)/caffe
   # MXNET_PLUGINS += plugin/caffe/caffe.mk
   
   # WARPCTC_PATH = $(HOME)/warp-ctc
   # MXNET_PLUGINS += plugin/warpctc/warpctc.mk
   
   # whether to use sframe integration. This requires build sframe
   # [email protected]:dato-code/SFrame.git
   # SFRAME_PATH = $(HOME)/SFrame
   # MXNET_PLUGINS += plugin/sframe/plugin.mk```
   
   ## Error Message:
   This line looks wrong
   
https://github.com/apache/incubator-mxnet/blob/24cce9e3c99e499b696b779cbb3b863145f473f1/src/operator/rnn.cc#L320
   `DType* bias_n = weight_iter_n + L * H * ngates * H;`
   Shouldn't it be:
   `DType* bias_n = weight_iter_n + L * ngates * H;`
   
   Just trying to understand the memory order of the weights.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] eloi-loomai opened a new issue #15745: Memory layout in the LSTM operator

Reply via email to