azai91 opened a new issue #13141: MKLDNN softmax outputs NaN in mkldnn 0.14 URL: https://github.com/apache/incubator-mxnet/issues/13141 Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form. For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io ## Description Extremely negative softmax inputs output NaN. This is an error caught detected in MKLDNN already (https://github.com/intel/mkl-dnn/issues/106) with a fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in MKLDNN v0.15+ (we are v0.14). The fix is either to: 1. patch MKLDNN v0.14 with the earlier fix 2. to upgrade the MKLDNN version in mxnet (https://github.com/apache/incubator-mxnet/pull/12953). ## Environment info (Required) ``` ubuntu@ip-172-31-3-217:~$ python diagnose.py ----------Python Info---------- Version : 3.6.4 Compiler : GCC 7.2.0 Build : ('default', 'Jan 16 2018 18:10:19') Arch : ('64bit', '') ------------Pip Info----------- Version : 9.0.1 Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip ----------MXNet Info----------- /home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Version : 1.3.0 Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet Commit Hash : b3be92f4a48bce62a5a8424271871c2f81c8f7f1 ----------System Info---------- Platform : Linux-4.4.0-1065-aws-x86_64-with-debian-stretch-sid system : Linux node : ip-172-31-3-217 release : 4.4.0-1065-aws version : #75-Ubuntu SMP Fri Aug 10 11:14:32 UTC 2018 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz Stepping: 3 CPU MHz: 3000.000 BogoMIPS: 6000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 25344K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0012 sec, LOAD: 0.4806 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1717 sec, LOAD: 0.5293 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1596 sec, LOAD: 0.3734 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0262 sec, LOAD: 0.1173 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0013 sec, LOAD: 0.3264 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0118 sec, LOAD: 0.0690 sec. ``` Package used (Python/R/Scala/Julia): Python For Scala user, please provide: 1. Java version: (`java -version`) 2. Maven version: (`mvn -version`) 3. Scala runtime if applicable: (`scala -version`) For R user, please provide R `sessionInfo()`: ## Build info (Required if built from source) Compiler (gcc/clang/mingw/visual studio): MXNet commit hash: 6b5d9f9785a398d2e8ccaa950f89fb76d76d5bd4 Build config: MKLDNN (pip install mxnet-mkl) ## Error Message: ``` ubuntu@ip-172-31-3-217:~/incubator-mxnet$ python tt.py /home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters [ [[[[nan nan]]]] <NDArray 1x1x1x2 @cpu(0)>] ``` ## Minimum reproducible example ``` import mxnet as mx input_data = mx.nd.array([[[[-1e30,-1e30]]]]) data = mx.sym.Variable('data') out1 = data.softmax(axis=1) exec1 = out1.bind(mx.cpu(), args={'data': input_data, 'softmax_label': mx.nd.ones([1]), 'fc_weight': mx.nd.ones([2,2]), 'fc1_weight': mx.nd.ones([2,2])}) exec1.forward()[0].wait_to_read() print(exec1.outputs) ``` ## Steps to reproduce Run the following script. ## What have you tried to solve it? Applying this one line fix (https://gist.github.com/emfomenk/0386c529c5df21ae308b00d16454c48e) in mkldnn fixes the issue.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
