[GitHub] [incubator-mxnet] RuRo opened a new issue #17844: Hybridized SoftReLU on the GPU is not numerically stable

GitBox Mon, 16 Mar 2020 05:21:28 -0700

RuRo opened a new issue #17844: Hybridized SoftReLU on the GPU is not 
numerically stable
URL: https://github.com/apache/incubator-mxnet/issues/17844
 
 
   ## Description
   This is a really weird issue, which causes the `SoftReLU` activation 
function to overflow on even fairly small input values. This issue only 
happens, when the activation function is executed as part of a hybridized graph 
on the GPU.
   
   ## To Reproduce
   ```python
   import mxnet as mx
   import numpy as np
   
   class SoftReLU(mx.gluon.HybridBlock):
       def hybrid_forward(self, F, x):
           x = F.identity(x)
           x = F.Activation(x, act_type='softrelu')
           x = F.identity(x)
           return x
   
   fun_normal = SoftReLU()
   fun_hybrid = SoftReLU()
   fun_hybrid.hybridize()
   inp = 100
   
   for f_name, fun in [('Normal', fun_normal), ('Hybrid', fun_hybrid)]:
       for c_name, ctx in [('CPU', mx.cpu()), ('GPU', mx.gpu())]:
           for dtype in [np.float16, np.float32, np.float64]:
               data = mx.nd.array([inp], ctx=ctx, dtype=dtype)
               result = fun(data)
               i_dtype = np.dtype(dtype).name
               o_dtype = np.dtype(result.dtype).name
               print(f"{f_name} on context {c_name} ({i_dtype} => {o_dtype}) 
SoftReLU({inp}) = {result.asnumpy().item()}")
   ```
   
   which prints
   
   ```
   Normal on context CPU (float16 => float16) SoftReLU(100) = 100.0
   Normal on context CPU (float32 => float32) SoftReLU(100) = 100.0
   Normal on context CPU (float64 => float64) SoftReLU(100) = 100.0
   Normal on context GPU (float16 => float16) SoftReLU(100) = 100.0
   Normal on context GPU (float32 => float32) SoftReLU(100) = 100.0
   Normal on context GPU (float64 => float64) SoftReLU(100) = 100.0
   Hybrid on context CPU (float16 => float16) SoftReLU(100) = 100.0
   Hybrid on context CPU (float32 => float32) SoftReLU(100) = 100.0
   Hybrid on context CPU (float64 => float64) SoftReLU(100) = 100.0
   Hybrid on context GPU (float16 => float16) SoftReLU(100) = inf
   Hybrid on context GPU (float32 => float32) SoftReLU(100) = inf
   Hybrid on context GPU (float64 => float64) SoftReLU(100) = inf
   ```
   
   I am suspecting, that the hybridized GPU implementation doesn't utilize the 
[`log-sum-exp` 
trick](https://en.wikipedia.org/wiki/LogSumExp#log-sum-exp_trick_for_log-domain_calculations)
 for numerical stability.
   
   I am not 100% sure why, but if the `SoftReLU` is right at the start or end 
of the graph, then this issue doesn't happen, so I had to add `F.identity` 
no-ops around it, but any other operation seems to also work, so it's not 
`identity` specific. Maybe it doesn't get Hybridized properly if it's at the 
edge of the graph or something.
   
   ## Environment
   
   I've reproduced this on 2 machines with different mxnet versions:
   
   <details>
   <summary><code>mxnet 1.5.1</code> installed from <code>pip</code></summary>
   
   ```
   curl --retry 10 -s 
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | 
python
   
   ----------Python Info----------                                              
                                                                                
                 
   Version      : 3.7.5                                                         
                                                                                
                 
   Compiler     : GCC 8.3.0                                                     
                                                                                
                 
   Build        : ('default', 'Nov  7 2019 10:50:52')                           
                                                                                
                 
   Arch         : ('64bit', '')                                                 
                                                                                
                 
   ------------Pip Info-----------                                              
                                                                                
                 
   Version      : 20.0.2                                                        
                                                                                
                 
   Directory    : /usr/local/lib/python3.7/dist-packages/pip                    
                                                                                
                 
   ----------MXNet Info-----------                                              
                                                                                
                 
   Version      : 1.5.1                                                         
                                                                                
                 
   Directory    : /usr/local/lib/python3.7/dist-packages/mxnet                  
                                                                                
                 
   Num GPUs     : 6                                                             
                                                                                
                 
   Commit Hash   : c9818480680f84daa6e281a974ab263691302ba8                     
                                                                                
                 
   ----------System Info----------                                              
                                                                                
                 
   Platform     : Linux-4.20.0-042000-generic-x86_64-with-Ubuntu-18.04-bionic   
                                                                                
                 
   system       : Linux                                                         
                                                                                
                 
   node         : redacted                                                      
                                                                                
       
   release      : 4.20.0-042000-generic   
   version      : #201812232030 SMP Mon Dec 24 01:32:58 UTC 2018
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:        x86_64
   CPU op-mode(s):      32-bit, 64-bit
   Byte Order:          Little Endian
   CPU(s):              40
   On-line CPU(s) list: 0-39
   Thread(s) per core:  2
   Core(s) per socket:  10
   Socket(s):           2
   NUMA node(s):        2
   Vendor ID:           GenuineIntel
   CPU family:          6
   Model:               62
   Model name:          Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
   Stepping:            4
   CPU MHz:             3210.788
   CPU max MHz:         3600.0000
   CPU min MHz:         1200.0000
   BogoMIPS:            6000.01
   Virtualization:      VT-x
   L1d cache:           32K
   L1i cache:           32K
   L2 cache:            256K
   L3 cache:            25600K
   NUMA node0 CPU(s):   0-9,20-29
   NUMA node1 CPU(s):   10-19,30-39
   Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb ssbd ibrs 
ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt 
dtherm ida arat pln pts flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0012 
sec, LOAD: 0.7693 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0006 
sec, LOAD: 0.6054 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.2659 sec, LOAD: 
0.2331 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.1060 sec, LOAD: 0.0806 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.2956 sec, LOAD: 0.0992 sec.
   Timing for FashionMNIST: 
https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, 
DNS: 0.2427 sec, LOAD: 0.9397 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0194 sec, LOAD: 
0.5106 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.2071 sec, 
LOAD: 0.9205 sec.
   ```
   
   </details>
   
   <details>
   <summary><code>mxnet</code> built from source from commit <a 
href="https://github.com/apache/incubator-mxnet/commit/34010ea417014a6b6203395d3359099431c362cf";><code>34010ea</code></a>
 (recent master)</summary>
   
   ```
   curl --retry 10 -s 
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | 
python
   
   ----------Python Info----------
   Version      : 3.8.1
   Compiler     : GCC 9.2.0
   Build        : ('default', 'Jan 22 2020 06:38:00')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 19.3.1
   Directory    : /usr/lib/python3.8/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : /usr/lib/python3.8/site-packages/mxnet
   Num GPUs     : 1
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Linux-5.4.24-1-MANJARO-x86_64-with-glibc2.2.5
   system       : Linux
   node         : redacted
   release      : 5.4.24-1-MANJARO
   version      : #1 SMP PREEMPT Thu Mar 5 20:29:25 UTC 2020
   ----------Hardware Info----------
   machine      : x86_64
   processor    : 
   Architecture:                    x86_64
   CPU op-mode(s):                  32-bit, 64-bit
   Byte Order:                      Little Endian
   Address sizes:                   39 bits physical, 48 bits virtual
   CPU(s):                          8
   On-line CPU(s) list:             0-7
   Thread(s) per core:              2
   Core(s) per socket:              4
   Socket(s):                       1
   NUMA node(s):                    1
   Vendor ID:                       GenuineIntel
   CPU family:                      6
   Model:                           158
   Model name:                      Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
   Stepping:                        9
   CPU MHz:                         3294.305
   CPU max MHz:                     3800.0000
   CPU min MHz:                     800.0000
   BogoMIPS:                        5602.18
   Virtualization:                  VT-x
   L1d cache:                       128 KiB
   L1i cache:                       128 KiB
   L2 cache:                        1 MiB
   L3 cache:                        6 MiB
   NUMA node0 CPU(s):               0-7
   Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
   Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional 
cache flushes, SMT vulnerable
   Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT 
vulnerable
   Vulnerability Meltdown:          Mitigation; PTI
   Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
disabled via prctl and seccomp
   Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and 
__user pointer sanitization
   Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB 
conditional, IBRS_FW, STIBP conditional, RSB filling
   Vulnerability Tsx async abort:   Not affected
   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rd
                                    tscp lm constant_tsc art arch_perfmon pebs 
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 
monitor ds_cpl vmx e
                                    st tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid 
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
lahf_lm abm 3dnowpre
                                    fetch cpuid_fault epb invpcid_single pti 
ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust bmi1 avx2 smep
                                     bmi2 erms invpcid mpx rdseed adx smap 
clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp 
hwp_notify hwp_act_wind
                                    ow hwp_epp md_clear flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0233 
sec, LOAD: 0.8981 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0160 
sec, LOAD: 0.5953 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0159 sec, LOAD: 
0.2458 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.0401 sec, LOAD: 0.3481 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0384 sec, LOAD: 0.2148 sec.
   Timing for FashionMNIST: 
https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, 
DNS: 0.1041 sec, LOAD: 0.2033 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0381 sec, LOAD: 
1.0081 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0170 sec, 
LOAD: 0.3241 sec.
   ```
   
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] RuRo opened a new issue #17844: Hybridized SoftReLU on the GPU is not numerically stable

Reply via email to