connorgoggins opened a new issue #17716: [Large Tensor] linalg ops fail w/input 
dim >= 2**32
URL: https://github.com/apache/incubator-mxnet/issues/17716
 
 
   ## Description
   While testing the `linalg_*` ops on large tensor (dimension >= 2**32) data, 
I found that all of these ops fail with a segmentation fault on large tensor 
data. In a test run with the `linalg_det` op, I traced the error to line 485 of 
`src/operator/tensor/la_op-inl.h`. This line lies within the `Map` void 
function which takes in several parameters, namely an `int` i, an `int` N, and 
an `int*` pivot. The error is thrown in the iteration portion of the function, 
where an `int` j is incremented up to the value of `int` N.
   
   This error occurred irrespective of which BLAS engine MXNet was built with 
(MKL or OpenBLAS).
   
   ## Environment
   ```
   ----------Python Info----------
   Version      : 3.6.6
   Compiler     : GCC 7.2.0
   Build        : ('default', 'Jun 28 2018 17:14:51')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.3.1
   Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : /home/ubuntu/forked-mxnet/python/mxnet
   Num GPUs     : 0
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Linux-4.4.0-1102-aws-x86_64-with-debian-stretch-sid
   system       : Linux
   node         : ip-172-31-41-238
   release      : 4.4.0-1102-aws
   version      : #113-Ubuntu SMP Wed Jan 29 14:54:54 UTC 2020
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                96
   On-line CPU(s) list:   0-95
   Thread(s) per core:    2
   Core(s) per socket:    24
   Socket(s):             2
   NUMA node(s):          2
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 85
   Model name:            Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   Stepping:              7
   CPU MHz:               2499.998
   BogoMIPS:              4999.99
   Hypervisor vendor:     KVM
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              1024K
   L3 cache:              36608K
   NUMA node0 CPU(s):     0-23,48-71
   NUMA node1 CPU(s):     24-47,72-95
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
$
   onstant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf 
tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic 
movbe $
   opcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 
erms in
   vpcid mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec 
xgetbv1 ida arat pku
   ```
   ### MXNet build flags
   #### BLAS = MKL
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✔ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   #### BLAS = OpenBLAS
   ✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✔ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP
   
   ## Steps to reproduce
   ### Script
   Create a Python script with the following content:
   ```
   from mxnet import nd
   print(nd.linalg_det(A=nd.random_normal(shape=(2**16, 2**16))))
   ```
   and run it with Python3.
   
   ### Error
   With both BLAS engines, the error is the same:
   ```
   Segmentation fault (core dumped)
   ```
   
   ## Additional Information
   The `linalg` ops do not throw errors on data with dimension <= 2**32. See 
the following example script and output:
   
   ### Script
   ```
   from mxnet import nd
   print(nd.linalg_det(A=nd.random_normal(shape=(2**15, 2**15))))
   ```
   
   ### Output
   ```
   [inf]
   <NDArray 1 @cpu(0)>
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to