kuonangzhe opened a new issue #15646: TensorRT protobuf sizelimit and subgraph 
error
URL: https://github.com/apache/incubator-mxnet/issues/15646
 
 
   ## Description
   TensorRT inference report error for protobuf size limit for ssd model and 
subgraph error for maskrcnn model. Models are loaded from gluoncv. 
   
   ## Environment info (Required)
   Ubuntu 18.04, CUDA 10.1, mxnet 1.5.0, Tensorrt 5.0, GTX 1060ti, Python3.6
   
   Compiler: ccache, nvcc
   
   MXNet commit hash: 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
   
   Build config:
   ```
   set -ex
   
   # Build ONNX
   pushd .
   echo "Installing ONNX."
   cd 3rdparty/onnx-tensorrt/third_party/onnx
   rm -rf build
   mkdir -p build
   cd build
   cmake \
       -DCMAKE_CXX_COMPILER_LAUNCHER=gcc \
       -DCMAKE_C_COMPILER_LAUNCHER=gcc \
       -DCMAKE_CXX_FLAGS=-I/usr/include/python3\
       -DBUILD_SHARED_LIBS=ON ..\
       -G Ninja
   ninja -j 1 -v onnx/onnx.proto
   ninja -j 1 -v
   export LIBRARY_PATH=`pwd`:`pwd`/onnx/:$LIBRARY_PATH
   export CPLUS_INCLUDE_PATH=`pwd`:$CPLUS_INCLUDE_PATH
   popd
   
   # Build ONNX-TensorRT
   pushd .
   cd 3rdparty/onnx-tensorrt/
   rm -rf build
   mkdir -p build
   cd build
   cmake \
       -DCMAKE_CXX_COMPILER_LAUNCHER=gcc \
       -DCMAKE_C_COMPILER_LAUNCHER=gcc \
       ..
   make -j$(nproc)
   export LIBRARY_PATH=`pwd`:$LIBRARY_PATH
   popd
   
   mkdir -p ./lib/
   cp 3rdparty/onnx-tensorrt/third_party/onnx/build/*.so ./lib/
   cp -L 3rdparty/onnx-tensorrt/build/libnvonnxparser_runtime.so.0 ./lib/
   cp -L 3rdparty/onnx-tensorrt/build/libnvonnxparser.so.0 ./lib/
   ```
   ```
   rm -rf build
   mkdir -p build
   cd ./build
   cmake -DUSE_CUDA=1                            \
         -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache     \
         -DCMAKE_C_COMPILER_LAUNCHER=ccache        \
         -DCMAKE_CXX_COMPILER_LAUNCHER=ccache      \
         -DUSE_CUDNN=1                           \
         -DUSE_OPENCV=1                          \
         -DUSE_TENSORRT=1                        \
         -DUSE_OPENMP=1                          \
         -DUSE_MKLDNN=0                          \
         -DUSE_MKL_IF_AVAILABLE=OFF              \
         -DENABLE_TESTCOVERAGE=ON                \
         ..
   make
   ```
   ## Error Message:
   ssd_512_resnet50_v1_coco:
   ```
   [11:24:07] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:686: 
start to execute partition graph.
   [libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol 
message was rejected because it was too big (more than 67108864 bytes).  To 
increase the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
   Traceback (most recent call last):
     File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 1623, in 
simple_bind
       ctypes.byref(exe_handle)))
     File "../incubator-mxnet/python/mxnet/base.py", line 253, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: Could not parse ONNX from string
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "test_resnet18.py", line 107, in <module>
       test_tensorrt_resnet18_feature_vect(model_name)
     File "test_resnet18.py", line 71, in test_tensorrt_resnet18_feature_vect
       grad_req='null', force_rebind=True)
     File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 1629, in 
simple_bind
       raise RuntimeError(error_msg)
   RuntimeError: simple_bind error. Arguments:
   data: (1, 3, 512, 512)
   force_rebind: True
   Could not parse ONNX from string
   ```
   mask_rcnn_resnet18_v1b_coco:
   ```
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:686: 
start to execute partition graph. 
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus0. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus0. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, 
maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__plus0. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__plus0. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, 
maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus1. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: 
Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus1. 
Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, 
maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying
   Traceback (most recent call last):
     File "test_resnet18.py", line 107, in <module>
       test_tensorrt_resnet18_feature_vect(model_name, batch_shape)
     File "test_resnet18.py", line 65, in test_tensorrt_resnet18_feature_vect
       trt_sym = sym.get_backend_symbol('TensorRT')
     File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 2564, in 
get_backend_symbol
       check_call(_LIB.MXGenBackendSubgraph(self.handle, c_str(backend), 
ctypes.byref(out)))
     File "../incubator-mxnet/python/mxnet/base.py", line 253, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [14:02:47] 
/home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:258: 
Check failed: excluded_node_id != static_cast<int>(snid) (152 vs. 152) : A 
cycle is found in the computational graph between nodes 
maskrcnn0_rpn0_bboxcornertocenter0__plus1 and 
maskrcnn0_rpn0_bboxcornertocenter0__plus1
   Stack trace:
     [bt] (0) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57)
 [0x7f9485773f47]
     [bt] (1) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::LabelSubgraph(nnvm::Graph
 const&, std::shared_ptr<mxnet::op::SubgraphSelectorV2>, int, unsigned long, 
std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, 
std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, 
std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> >*, 
std::unordered_set<mxnet::op::BiDirectedNode const*, 
std::hash<mxnet::op::BiDirectedNode const*>, 
std::equal_to<mxnet::op::BiDirectedNode const*>, 
std::allocator<mxnet::op::BiDirectedNode const*> >*)+0x1fa3) [0x7f9486a466b3]
     [bt] (2) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::PreSelectSubgraphNodes(nnvm::Graph
 const&, std::shared_ptr<mxnet::op::SubgraphSelectorV2>, int, unsigned long, 
std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, 
std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, 
std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> >*)+0x18a) [0x7f9486a4798a]
     [bt] (3) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::SelectSubgraphNodes(nnvm::Graph*,
 std::shared_ptr<mxnet::op::SubgraphSelectorV2>, 
std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, 
std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, 
std::vector<std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> >, 
std::allocator<std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> > > >*, 
std::vector<std::shared_ptr<mxnet::op::SubgraphSelectorV2>, 
std::allocator<std::shared_ptr<mxnet::op::SubgraphSelectorV2> > >*, 
mxnet::op::BiDirectedNode const*, unsigned long, unsigned long*)+0x15e) 
[0x7f9486a4847e]
     [bt] (4) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::FindSubgraphs(nnvm::Graph*,
 mxnet::op::SubgraphProperty const&, 
std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, 
std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, 
std::vector<std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> >, 
std::allocator<std::vector<mxnet::op::BiDirectedNode*, 
std::allocator<mxnet::op::BiDirectedNode*> > > >*, 
std::vector<std::shared_ptr<mxnet::op::SubgraphSelectorV2>, 
std::allocator<std::shared_ptr<mxnet::op::SubgraphSelectorV2> > >*)+0x49a) 
[0x7f9486a48ffa]
     [bt] (5) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::BuildSubgraph(nnvm::Graph&&)+0x3c7)
 [0x7f9486a4c3a7]
     [bt] (6) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<nnvm::Graph
 (nnvm::Graph), nnvm::Graph (*)(nnvm::Graph&&)>::_M_invoke(std::_Any_data 
const&, nnvm::Graph&&)+0x29) [0x7f9485c4bcd9]
     [bt] (7) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::ApplyPasses(nnvm::Graph,
 std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&)+0x43a) 
[0x7f9488018b1a]
     [bt] (8) 
/home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::ApplyPass(nnvm::Graph,
 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> const&)+0x150) [0x7f94857fb4b0]
   ```
   ## Minimum reproducible example
   Test script uses tests/python/tensorrt/test_resnet18.py. though the original 
script runs ok. I use master branch of gluoncv to load pretrained models.  
   
   ## What have you tried to solve it?
   https://discuss.mxnet.io/t/tensorrt-test-letnet5-onnox-parser-error/4508/3
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to