[GitHub] [incubator-mxnet] connorgoggins opened a new issue #17595: MKLDNN incompatibility with large tensor (dim >= 2^32) data

GitBox Fri, 14 Feb 2020 11:20:48 -0800

connorgoggins opened a new issue #17595: MKLDNN incompatibility with large 
tensor (dim >= 2^32) data
URL: https://github.com/apache/incubator-mxnet/issues/17595
 
 
   ## Description
   While testing individual ops for large tensor (dimension >= 2^32) input 
functionality, I found an error in MKLDNN. Within 
`3rdparty/mkldnn/src/cpu/gemm/gemm.cpp` on line 43 there is a function which 
takes in several parameters, including `M` (the variable used to accept the 
data dimension in the input). `M` is designated as an `int`, so when the value 
2^32 is passed in as the first dimension of the input data the > 0 assertion on 
the next line fails (since the `int` dtype in C++ interprets 2^32 as 0).
   
   Note that this error occurs whenever MKLDNN is enabled - whether the BLAS 
engine is MKL, OpenBLAS, or none. When MKLDNN is disabled, the error does not 
occur.
   
   
   ## Failing Environments
   ### BLAS = None
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ### BLAS = MKL
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✔ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ### BLAS = OpenBLAS
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ## Successful Environments
   ### BLAS = None
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ### BLAS = MKL
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✔ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ### BLAS = OpenBLAS
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ### Steps to reproduce
   Run `mx.nd.FullyConnected(data=mx.nd.random_normal(shape=(2**32,1)), 
weight=mx.nd.random_normal(shape=(1,1)), bias=mx.nd.random_normal(shape=(1,)), 
flatten=False, num_hidden=1`
   
   ### Error Message
   ```python3: /home/ubuntu/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm.cpp:43: 
void dnnl::impl::cpu::msan_unpoison_matrix(void*, int, int, int, size_t): 
Assertion `C
   != nullptr && M > 0 && N > 0 && LDC >= M && typesize' failed.
   Aborted (core dumped)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins opened a new issue #17595: MKLDNN incompatibility with large tensor (dim >= 2^32) data

Reply via email to