[GitHub] azai91 opened a new issue #13141: MKLDNN softmax outputs NaN in mkldnn 0.14

GitBox Tue, 06 Nov 2018 12:03:27 -0800

azai91 opened a new issue #13141: MKLDNN softmax outputs NaN in mkldnn 0.14
URL: https://github.com/apache/incubator-mxnet/issues/13141
 
 
   Note: Providing complete information in the most concise form is the best 
way to get help. This issue template serves as the checklist for essential 
information to most of the technical issues and bug reports. For non-technical 
issues and feature requests, feel free to present the information in what you 
believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at 
https://discuss.mxnet.io 
   
   ## Description
   Extremely negative softmax inputs output NaN. This is an error caught 
detected in MKLDNN already (https://github.com/intel/mkl-dnn/issues/106) with a 
fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in 
MKLDNN v0.15+ (we are v0.14). 
   
   The fix is either to: 
   1. patch MKLDNN v0.14 with the earlier fix
   2. to upgrade the MKLDNN version in mxnet 
(https://github.com/apache/incubator-mxnet/pull/12953). 
   
   
   ## Environment info (Required)
   ```
   ubuntu@ip-172-31-3-217:~$ python diagnose.py
   ----------Python Info----------
   Version      : 3.6.4
   Compiler     : GCC 7.2.0
   Build        : ('default', 'Jan 16 2018 18:10:19')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 9.0.1
   Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   /home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: 
FutureWarning: Conversion of the second argument of issubdtype from `float` to 
`np.floating` is deprecated. In future, it will be treated as `np.float64 == 
np.dtype(float).type`.
     from ._conv import register_converters as _register_converters
   Version      : 1.3.0
   Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet
   Commit Hash   : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
   ----------System Info----------
   Platform     : Linux-4.4.0-1065-aws-x86_64-with-debian-stretch-sid
   system       : Linux
   node         : ip-172-31-3-217
   release      : 4.4.0-1065-aws
   version      : #75-Ubuntu SMP Fri Aug 10 11:14:32 UTC 2018
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                72
   On-line CPU(s) list:   0-71
   Thread(s) per core:    2
   Core(s) per socket:    18
   Socket(s):             2
   NUMA node(s):          2
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 85
   Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
   Stepping:              3
   CPU MHz:               3000.000
   BogoMIPS:              6000.00
   Hypervisor vendor:     KVM
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              1024K
   L3 cache:              25344K
   NUMA node0 CPU(s):     0-17,36-53
   NUMA node1 CPU(s):     18-35,54-71
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf 
eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 
erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt 
xsavec xgetbv1 ida arat
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0012 
sec, LOAD: 0.4806 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1717 sec, LOAD: 
0.5293 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1596 sec, LOAD: 
0.3734 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0262 sec, LOAD: 0.1173 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0013 sec, LOAD: 
0.3264 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0118 sec, 
LOAD: 0.0690 sec.
   ```
   
   Package used (Python/R/Scala/Julia):
   Python
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio):
   
   MXNet commit hash:
   6b5d9f9785a398d2e8ccaa950f89fb76d76d5bd4
   
   Build config:
   MKLDNN (pip install mxnet-mkl)
   
   ## Error Message:
   ```
   ubuntu@ip-172-31-3-217:~/incubator-mxnet$ python tt.py
   /home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: 
FutureWarning: Conversion of the second argument of issubdtype from `float` to 
`np.floating` is deprecated. In future, it will be treated as `np.float64 == 
np.dtype(float).type`.
     from ._conv import register_converters as _register_converters
   [
   [[[[nan nan]]]]
   <NDArray 1x1x1x2 @cpu(0)>]
   ```
   
   ## Minimum reproducible example
   ```
   import mxnet as mx
   input_data = mx.nd.array([[[[-1e30,-1e30]]]])
   data = mx.sym.Variable('data')
   out1 = data.softmax(axis=1)
   exec1 = out1.bind(mx.cpu(), args={'data': input_data, 'softmax_label': 
mx.nd.ones([1]), 'fc_weight': mx.nd.ones([2,2]), 'fc1_weight': 
mx.nd.ones([2,2])})
   exec1.forward()[0].wait_to_read()
   print(exec1.outputs)
   ```
   
   ## Steps to reproduce
   Run the following script.
   
   
   ## What have you tried to solve it?
   
   Applying this one line fix 
(https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in mkldnn 
fixes the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] azai91 opened a new issue #13141: MKLDNN softmax outputs NaN in mkldnn 0.14

Reply via email to