Adam1105 opened a new issue #20858:
URL: https://github.com/apache/incubator-mxnet/issues/20858


   ## Description
   I am using the latest release of v1.8.x mxnet installed with pip 
(mxnet-1.8.0.post0-cp39-cp39-macosx_10_13_x86_64.whl), more info in the 
Environment section. When using mkldnn and the NaiveEngine a model with an even 
number of channels in batch norm crashes in the backward call with an 
"MXNetError: Check failed: !is_view" error.
   
   This seems very similar to the issue described in the 
[bug](https://github.com/apache/incubator-mxnet/issues/19150). Apparently, it 
was fixed only for the forward pass.
   
   ### Error Message
   Is MKLDNN enabled: True
   input channel of 45
   [15:54:53] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
   input channel of 45, (1, 45, 8, 80, 80)
   input channel of 64
   Traceback (most recent call last):
     File "/Users/gabrysa/./buggy_model.py", line 67, in <module>
       l.backward()
     File "/usr/local/lib/python3.9/site-packages/mxnet/ndarray/ndarray.py", 
line 2864, in backward
       check_call(_LIB.MXAutogradBackwardEx(
     File "/usr/local/lib/python3.9/site-packages/mxnet/base.py", line 246, in 
check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "../src/ndarray/ndarray.cc", line 650
   MXNetError: Check failed: !is_view:
   
   ## To Reproduce
   ### code 
   ```python
   from mxnet import init
   from mxnet.context import cpu
   from mxnet.gluon import nn, loss, Trainer
   from mxnet.gluon.block import HybridBlock
   from mxnet.gluon.nn import BatchNorm
   
   import mxnet as mx
   
   class BuggyModel(HybridBlock):
   
       def __init__(
           self,
           channels,
           norm_layer=BatchNorm,
           norm_kwargs=None,
           in_channels=3,
           **kwargs
       ):
           super(BuggyModel, self).__init__(**kwargs)
           self.in_channels = in_channels
           with self.name_scope():
               self.conv1 = nn.Conv3D(
                       in_channels=self.in_channels,
                       channels=channels,
                       kernel_size=(1, 7, 7),
                       strides=(1, 2, 2),
                       padding=(0, 3, 3),
                       use_bias=False,
                       )
               self.bn1 = norm_layer(in_channels=channels, **({} if norm_kwargs 
is None else norm_kwargs))
   
       def hybrid_forward(self, F, x):
           """Hybrid forward of R2+1D net"""
           x = self.conv1(x)
           x = self.bn1(x)
           return x
   
   print(f"Is MKLDNN enabled: {mx.runtime.Features().is_enabled('MKLDNN')}")
   
   print(f"input channel of 45")
   net = BuggyModel(channels=45)
   net.initialize(init=init.Constant(1))
   l2_loss = loss.L2Loss()
   trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
   
   input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
   with mx.autograd.record():
       output = net(input_data)
       target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
       l = l2_loss(output, target_data)
   l.backward()
   
   print(f"input channel of 45, {output.shape}")
   
   print(f"input channel of 64")
   net = BuggyModel(channels=64)
   net.initialize(init=init.Constant(1))
   input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
   l2_loss = loss.L2Loss()
   trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
   
   input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
   with mx.autograd.record():
       output = net(input_data)
       target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
       l = l2_loss(output, target_data)
   l.backward()
   print(f"input channel of 64, {output.shape}")
   ```
   
   ### Steps to reproduce
   1. paste above code to the ./code.py
   2. Run the code with MKLDNN using MXNet Naive Engine: 
`MXNET_ENGINE_TYPE=NaiveEngine python3 ./code.py`
   
   
   ## Environment
   <details>
   <summary>Environment Information</summary>
   
   ```
   ----------Python Info----------
   Version      : 3.9.6
   Compiler     : Clang 12.0.5 (clang-1205.0.22.9)
   Build        : ('default', 'Jun 29 2021 05:25:02')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 21.1.3
   Directory    : /usr/local/lib/python3.9/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.8.0
   Directory    : /usr/local/lib/python3.9/site-packages/mxnet
   Commit Hash   : 891d36c2d1c28f9486ec34ce4a7812e27896acef
   891d36c2d1c28f9486ec34ce4a7812e27896acef
   891d36c2d1c28f9486ec34ce4a7812e27896acef
   Library      : 
['/usr/local/lib/python3.9/site-packages/mxnet/libmxnet.dylib']
   Build features:
   ✖ CUDA
   ✖ CUDNN
   ✖ NCCL
   ✖ CUDA_RTC
   ✖ TENSORRT
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✔ CPU_SSE4_1
   ✖ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✖ CPU_AVX
   ✖ CPU_AVX2
   ✖ OPENMP
   ✖ SSE
   ✖ F16C
   ✖ JEMALLOC
   ✖ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✖ BLAS_MKL
   ✔ BLAS_APPLE
   ✔ LAPACK
   ✔ MKLDNN
   ✔ OPENCV
   ✖ CAFFE
   ✖ PROFILER
   ✖ DIST_KVSTORE
   ✖ CXX14
   ✖ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------Environment----------
   KMP_DUPLICATE_LIB_OK="True"
   KMP_INIT_AT_FORK="FALSE"
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to