chinakook opened a new issue #18765:
URL: https://github.com/apache/incubator-mxnet/issues/18765


   ## Description
   Severe Bug with nn.SymbolBlock when ctx=mx.gpu(0), cpu is OK.
   
   ### Error Message
   malloc or free or Segmentation fault error may appears randomly
   ```
   
/home/xxxxxx/anaconda3/envs/solo/lib/python3.7/site-packages/mxnet/gluon/block.py:1517:
 UserWarning: Cannot decide type for the following arguments. Consider 
providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   [17:15:59] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to 
disable)
   [(1, 256, 56, 56), (1, 512, 28, 28), (1, 1024, 14, 14), (1, 2048, 7, 7)]
   malloc(): unsorted double linked list corrupted
   [1]    87116 abort (core dumped)  python symbolblockbug.py
   
   ```
   
   ```
   
/home/xxxxxx/anaconda3/envs/solo/lib/python3.7/site-packages/mxnet/gluon/block.py:1517:
 UserWarning: Cannot decide type for the following arguments. Consider 
providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   [17:21:29] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to 
disable)
   [(1, 256, 56, 56), (1, 512, 28, 28), (1, 1024, 14, 14), (1, 2048, 7, 7)]
   
   Segmentation fault: 11
   
   ```
   
   ```
   
/home/xxxxxx/anaconda3/envs/solo/lib/python3.7/site-packages/mxnet/gluon/block.py:1517:
 UserWarning: Cannot decide type for the following arguments. Consider 
providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   [17:23:24] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to 
disable)
   [(1, 256, 56, 56), (1, 512, 28, 28), (1, 1024, 14, 14), (1, 2048, 7, 7)]
   malloc_consolidate(): invalid chunk size
   [1]    87701 abort (core dumped)  python symbolblockbug.py
   
   ```
   ## To Reproduce
   ```python
   import mxnet as mx
   from mxnet import gluon
   from mxnet.gluon import nn
   import gluoncv as gcv
   class NetEncoder(nn.SymbolBlock):
       def __init__(self, **kwargs):
           base_network = gcv.model_zoo.resnet50_v1(pretrained=False)
           outputs = ['stage1_activation2', 'stage2_activation3', 
'stage3_activation5',
                               'stage4_activation2']
   
           inputs, outputs, params = gcv.nn.feature._parse_network(
               base_network, outputs, ['data'], pretrained=False, ctx=mx.cpu(), 
**kwargs)
           super(NetEncoder, self).__init__(outputs, inputs, params=params)
       
   class Foo(nn.HybridBlock):
       def __init__(self):
           super(Foo, self).__init__()
           self.features = NetEncoder()
   
       def hybrid_forward(self, F, x):
           y = self.features(x)
           return y
   
   a = mx.nd.random.uniform(shape=(1,3,224,224), ctx=mx.gpu(0))
   
   f = Foo()
   f.collect_params().initialize()
   f.hybridize()
   f.collect_params().reset_ctx(mx.gpu(0))
   b = f(a)
   print([x.shape for x in b])
   ```
   
   
   
   ## Environment
   1. mxnet_cu102-1.7.0b20200719-py2.py3-none-manylinux2014_x86_64
   2. mxnet 2.0 master in April


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to