samskalicky opened a new issue #14727: shape input names order mismatch after 
partitioning
URL: https://github.com/apache/incubator-mxnet/issues/14727
 
 
   ## Description
   The input names of a symbol are produced by a DFS traversal of the symbol's 
graph from the outputs back up to the inputs. During graph partitioning, some 
nodes are added to subgraphs, thus potentially changing the order of the DFS 
traversal. After graph partitioning, shape propagation occurs, and the inferred 
shapes for the inputs are returned in the order that they appear in a DFS 
traversal.
   
   However, when graph partitioning happens and the DFS traversal order 
changes, the inferred shapes may be returned in a different order than 
expected. Since the original symbol is not modified, the caller is expecting 
the shapes in the same order as the original symbol. 
   
   Since DFS order is not guaranteed to be identical before and after 
partitioning, we need to map the names-to-shapes and ensure that the shapes are 
returned in the original order. 
   
   ## Environment info (Required)
   The error occurs on every release, and is reproducible on the master branch. 
I have built from source using the master branch and reproduced the problem.
   
   ## Error Message:
   ```
   Traceback (most recent call last):
     File "run.py", line 139, in <module>
       mod.set_params(arg_params, aux_params, allow_missing=True)
     File "/home/ubuntu/mxnet/python/mxnet/module/module.py", line 358, in 
set_params
       self._exec_group.set_params(arg_params, aux_params, 
allow_extra=allow_extra)
     File "/home/ubuntu/mxnet/python/mxnet/module/executor_group.py", line 413, 
in set_params
       exec_.copy_params_from(arg_params, aux_params, 
allow_extra_params=allow_extra)
     File "/home/ubuntu/mxnet/python/mxnet/executor.py", line 361, in 
copy_params_from
       array.astype(dst.dtype).copyto(dst)
     File "/home/ubuntu/mxnet/python/mxnet/ndarray/ndarray.py", line 2089, in 
copyto
       return _internal._copyto(self, out=other)
     File "<string>", line 25, in _copyto
     File "/home/ubuntu/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in 
_imperative_invoke
       ctypes.byref(out_stypes)))
     File "/home/ubuntu/mxnet/python/mxnet/base.py", line 254, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [22:29:46] 
src/operator/random/./../elemwise_op_common.h:135: Check failed: assign(&dattr, 
vec.at(i)): Incompatible attr in node  at 0-th output: expected 
[1,1,128,128,60], got [15,1024,1,1]
   Stack trace:
     [bt] (0) 
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32)
 [0x7fe5684779a2]
     [bt] (1) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(bool 
mxnet::op::ElemwiseAttr<mxnet::TShape, &mxnet::op::shape_is_none, 
&mxnet::op::shape_assign, true, &mxnet::op::shape_string[abi:cxx11], -1l, 
-1l>(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, 
std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, 
std::allocator<mxnet::TShape> >*, mxnet::TShape 
const&)::{lambda(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > 
const&, unsigned long, char const*)#1}::operator()(std::vector<mxnet::TShape, 
std::allocator<mxnet::TShape> > const&, unsigned long, char const*) 
const+0x2202) [0x7fe56868d322]
     [bt] (2) /home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(bool 
mxnet::op::ElemwiseShape<1l, 1l>(nnvm::NodeAttrs const&, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, 
std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x410) 
[0x7fe568692db0]
     [bt] (3) 
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xe8a) 
[0x7fe56a74d87a]
     [bt] (4) 
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&)+0x368) [0x7fe56a753a28]
     [bt] (5) 
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeImpl(void*,
 int, void**, int*, void***, int, char const**, char const**)+0xb2a) 
[0x7fe56ae4fd6a]
     [bt] (6) 
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeEx+0x534)
 [0x7fe56ae518f4]
     [bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) 
[0x7fe578ef5e40]
     [bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) 
[0x7fe578ef58ab]
   ```
   
   ## Minimum reproducible example
   This problem occurs on a few models, the one that I can share is the 
faster-rcnn model from the GluonCV package. Here is how to get the model:
   
   ```
   #get model
   import gluoncv as cv
   model = cv.model_zoo.faster_rcnn_resnet50_v1b_coco(pretrained=True)
   im_fname = 
cv.utils.download('https://github.com/dmlc/web-data/blob/master/gluoncv/detection/biking.jpg?raw=true',
 path='biking.jpg')
    
   x, orig_img = cv.data.transforms.presets.rcnn.load_test(im_fname)
   model.hybridize()
   box_ids, scores, bboxes = model(x)
   model.export('faster-rcnn')
   ```
   
   Once the model is exported, here is the code to reproduce the error using 
CPU context:
   
   ```
   import mxnet as mx
   import numpy as np
   from collections import namedtuple
   Batch = namedtuple('Batch', ['data'])
   import os
   from mxnet.base import _LIB, check_call, c_str, mx_uint, c_str_array
   
   op_names = [
       "_add",
               "_contrib_MultiBoxDetection",
               "_contrib_MultiBoxPrior",
               "_contrib_MultiBoxTarget",
               "_copy",
               "_div_scalar",
               "_DivScalar",
               "_minus",
               "_Minus",
               "_minus_scalar",
               "_MinusScalar",
               "_mul",
               "_Mul",
               "_mul_scalar",
               "_MulScalar",
               "_plus",
               "_Plus",
               "_plus_scalar",
               "_PlusScalar",
               "_rdiv_scalar",
               "_RDivScalar",
               "_rminus_scalar",
               "_RMinusScalar",
               "_rnn_param_concat",
               "_sub",
               "abs",
               "Activation",
               "arccos",
               "arccosh",
               "arcsin",
               "arcsinh",
               "arctan",
               "arctanh",
               "argmax",
               "argmin",
               "BatchNorm",
               "BatchNorm_v1",
               "BlockGrad",
               "broadcast_add",
               "broadcast_equal",
               "broadcast_greater",
               "broadcast_greater_equal",
               "broadcast_lesser",
               "broadcast_lesser_equal",
               "broadcast_mul",
               "broadcast_not_equal",
               "broadcast_plus",
               "cast",
               "Cast",
               "clip",
               "concat",
               "Concat",
               "Convolution",
               "Convolution_v1",
               "cos",
               "cosh",
               "crop",
               "Deconvolution",
               "Dropout",
               "elemwise_add",
               "elemwise_mul",
               "elemwise_sub",
               "Embedding",
               "exp",
               "expand_dims",
               "flatten",
               "Flatten",
               "flip",
               "FullyConnected",
               "identity",
               "identity",
               "LeakyReLU",
               "LinearRegressionOutput",
               "log",
               "log_softmax",
               "LRN",
               "make_loss",
               "MakeLoss",
               "max",
               "max_axis",
               "mean",
               "min",
               "min_axis",
               "negative",
               "one_hot",
               "pad",
               "Pad",
               "pick",
               "Pooling",
               "Pooling_v1",
               "prod",
               "reciprocal",
               "relu",
               "repeat",
               "reshape",
               "Reshape",
               "reverse",
               "RNN",
               "rsqrt",
               "sigmoid",
               "sin",
               "sinh",
               "slice",
               "SliceChannel",
               "softmax",
               "SoftmaxActivation",
               "SoftmaxOutput",
               "softmin",
               "split",
               "sqrt",
               "sum",
               "sum_axis",
               "tan",
               "tanh",
               "tile",
               "topk",
               "transpose",
               "zeros_like"
   ]
   check_call(_LIB.MXSetSubgraphPropertyOpNames(c_str("default"),
                                                mx_uint(len(op_names)),
                                                c_str_array(op_names)))
   
   os.environ['MXNET_SUBGRAPH_BACKEND'] = 'default'
   
   ctx = mx.cpu()
   
   sym, arg_params, aux_params = mx.model.load_checkpoint('faster-rcnn', 0)
   mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
   mod.bind(for_training=False, data_shapes=[('data', 
(1,3,224,224))],label_shapes=mod._label_shapes)
   mod.set_params(arg_params, aux_params, allow_missing=True)
   
   fname = 
mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true')
   img = mx.image.imread(fname)
   
   # convert into format (batch, RGB, width, height)
   img = mx.image.imresize(img, 224, 224) # resize
   img = img.transpose((2, 0, 1)) # Channel first
   img = img.expand_dims(axis=0) # batchify
   
   mod.forward(Batch([img]))
   print(mod.get_outputs())
   ```
   
   ## What have you tried to solve it?
   Ive tested a fix in a private branch: 
https://github.com/samskalicky/incubator-mxnet/commit/517d29498059d081873d1bd160d95479a5c8cea9
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to