leezu opened a new pull request #19185:
URL: https://github.com/apache/incubator-mxnet/pull/19185


   ## Description ##
   Resubmit https://github.com/apache/incubator-mxnet/pull/19034 which was 
temporarily reverted due to oneDNN issues with GCC 8.
   
   @TaoLV can your team help debug / fix the oneDNN issues?
   
   When both gcc8 + oneDNN 1.6.3 is present, we get the following nan bugs:
   
   ```
   [2020-09-17T17:48:04.979Z] ______________________ 
test_dc_hybridblock_deferred_init _______________________
   [2020-09-17T17:48:04.979Z] [gw0] linux -- Python 3.6.9 
/opt/rh/rh-python36/root/usr/bin/python3
   [2020-09-17T17:48:04.979Z] 
   [2020-09-17T17:48:04.979Z]     def test_dc_hybridblock_deferred_init():
   [2020-09-17T17:48:04.979Z]         class MyBlock(mx.gluon.HybridBlock):
   [2020-09-17T17:48:04.979Z]             def __init__(self):
   [2020-09-17T17:48:04.979Z]                 super().__init__()
   [2020-09-17T17:48:04.979Z]                 self.dense = 
mx.gluon.nn.Dense(units=10)
   [2020-09-17T17:48:04.979Z]                 self.weight = 
mx.gluon.Parameter('weight', allow_deferred_init=True)
   [2020-09-17T17:48:04.979Z]     
   [2020-09-17T17:48:04.979Z]             def infer_shape(self, x):
   [2020-09-17T17:48:04.979Z]                 self.weight.shape = (x.shape[1], )
   [2020-09-17T17:48:04.979Z]     
   [2020-09-17T17:48:04.979Z]             def forward(self, x):
   [2020-09-17T17:48:04.979Z]                 return self.dense(x) + 
self.weight.data(x.context)
   [2020-09-17T17:48:04.979Z]     
   [2020-09-17T17:48:04.979Z]         net = MyBlock()
   [2020-09-17T17:48:04.979Z]         net.initialize()
   [2020-09-17T17:48:04.979Z] >       _assert_dc_gluon(_dc_gluon_simple_setup, 
net, numpy=False)
   [2020-09-17T17:48:04.979Z] 
   [2020-09-17T17:48:04.979Z] 
tests/python/unittest/test_deferred_compute.py:504: 
   [2020-09-17T17:48:04.979Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   [2020-09-17T17:48:04.979Z] 
tests/python/unittest/test_deferred_compute.py:421: in _assert_dc_gluon
   [2020-09-17T17:48:04.979Z]     _all_same(ys_np, ys_hybrid_np)
   [2020-09-17T17:48:04.979Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   [2020-09-17T17:48:04.979Z] 
   [2020-09-17T17:48:04.979Z] arrays1 = [array([        nan,  0.2107217 , 
-0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z]         0.07460972, -0.08127148, 
-0.32424796,...33878, -0.10624887,
   [2020-09-17T17:48:04.979Z]         0.07460972, -0.08127148, -0.32424796, 
-0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z]       dtype=float32), ...]
   [2020-09-17T17:48:04.979Z] arrays2 = [array([ 0.01286458,  0.2107217 , 
-0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z]         0.07460972, -0.08127148, 
-0.32424796,...33878, -0.10624887,
   [2020-09-17T17:48:04.979Z]         0.07460972, -0.08127148, -0.32424796, 
-0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z]       dtype=float32), ...]
   [2020-09-17T17:48:04.979Z] message = ''
   [2020-09-17T17:48:04.979Z] 
   [2020-09-17T17:48:04.979Z]     def _all_same(arrays1, arrays2, message=''):
   [2020-09-17T17:48:04.979Z]         same = all(np.array_equal(a1, a2) for a1, 
a2 in zip(arrays1, arrays2))
   [2020-09-17T17:48:04.979Z]         if not same:
   [2020-09-17T17:48:04.979Z] >           raise AssertionError('Arrays not 
equal ({}):\n{}\n\n{}'.format(message, arrays1, arrays2))
   [2020-09-17T17:48:04.979Z] E           AssertionError: Arrays not equal ():
   [2020-09-17T17:48:04.979Z] E           [array([        nan,  0.2107217 , 
-0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([        
nan,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([        
nan,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32)]
   [2020-09-17T17:48:04.979Z] E           
   [2020-09-17T17:48:04.979Z] E           [array([ 0.01286458,  0.2107217 , 
-0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32), array([ 
0.01286458,  0.2107217 , -0.06851891,  0.16233878, -0.10624887,
   [2020-09-17T17:48:04.979Z] E                   0.07460972, -0.08127148, 
-0.32424796, -0.0124862 , -0.1862593 ],
   [2020-09-17T17:48:04.979Z] E                 dtype=float32)]
   ```
   
   Reverting either to gcc7 (which was done) or reverting the oneDNN update 
(https://github.com/apache/incubator-mxnet/pull/19180) fixes the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to