kuonangzhe opened a new issue #15646: TensorRT protobuf sizelimit and subgraph error URL: https://github.com/apache/incubator-mxnet/issues/15646 ## Description TensorRT inference report error for protobuf size limit for ssd model and subgraph error for maskrcnn model. Models are loaded from gluoncv. ## Environment info (Required) Ubuntu 18.04, CUDA 10.1, mxnet 1.5.0, Tensorrt 5.0, GTX 1060ti, Python3.6 Compiler: ccache, nvcc MXNet commit hash: 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f Build config: ``` set -ex # Build ONNX pushd . echo "Installing ONNX." cd 3rdparty/onnx-tensorrt/third_party/onnx rm -rf build mkdir -p build cd build cmake \ -DCMAKE_CXX_COMPILER_LAUNCHER=gcc \ -DCMAKE_C_COMPILER_LAUNCHER=gcc \ -DCMAKE_CXX_FLAGS=-I/usr/include/python3\ -DBUILD_SHARED_LIBS=ON ..\ -G Ninja ninja -j 1 -v onnx/onnx.proto ninja -j 1 -v export LIBRARY_PATH=`pwd`:`pwd`/onnx/:$LIBRARY_PATH export CPLUS_INCLUDE_PATH=`pwd`:$CPLUS_INCLUDE_PATH popd # Build ONNX-TensorRT pushd . cd 3rdparty/onnx-tensorrt/ rm -rf build mkdir -p build cd build cmake \ -DCMAKE_CXX_COMPILER_LAUNCHER=gcc \ -DCMAKE_C_COMPILER_LAUNCHER=gcc \ .. make -j$(nproc) export LIBRARY_PATH=`pwd`:$LIBRARY_PATH popd mkdir -p ./lib/ cp 3rdparty/onnx-tensorrt/third_party/onnx/build/*.so ./lib/ cp -L 3rdparty/onnx-tensorrt/build/libnvonnxparser_runtime.so.0 ./lib/ cp -L 3rdparty/onnx-tensorrt/build/libnvonnxparser.so.0 ./lib/ ``` ``` rm -rf build mkdir -p build cd ./build cmake -DUSE_CUDA=1 \ -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache \ -DCMAKE_C_COMPILER_LAUNCHER=ccache \ -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \ -DUSE_CUDNN=1 \ -DUSE_OPENCV=1 \ -DUSE_TENSORRT=1 \ -DUSE_OPENMP=1 \ -DUSE_MKLDNN=0 \ -DUSE_MKL_IF_AVAILABLE=OFF \ -DENABLE_TESTCOVERAGE=ON \ .. make ``` ## Error Message: ssd_512_resnet50_v1_coco: ``` [11:24:07] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:686: start to execute partition graph. [libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. Traceback (most recent call last): File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 1623, in simple_bind ctypes.byref(exe_handle))) File "../incubator-mxnet/python/mxnet/base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Could not parse ONNX from string During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_resnet18.py", line 107, in <module> test_tensorrt_resnet18_feature_vect(model_name) File "test_resnet18.py", line 71, in test_tensorrt_resnet18_feature_vect grad_req='null', force_rebind=True) File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 1629, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (1, 3, 512, 512) force_rebind: True Could not parse ONNX from string ``` mask_rcnn_resnet18_v1b_coco: ``` [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:686: start to execute partition graph. [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus0. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus0. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__plus0. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__plus0. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus1. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:300: Found a cycle when BFS from node maskrcnn0_rpn0_bboxcornertocenter0__minus1. Excluding nodes maskrcnn0_rpn0_bboxcornertocenter0_concat0, maskrcnn0_rpn0_bboxcornertocenter0__plus1, and retrying Traceback (most recent call last): File "test_resnet18.py", line 107, in <module> test_tensorrt_resnet18_feature_vect(model_name, batch_shape) File "test_resnet18.py", line 65, in test_tensorrt_resnet18_feature_vect trt_sym = sym.get_backend_symbol('TensorRT') File "../incubator-mxnet/python/mxnet/symbol/symbol.py", line 2564, in get_backend_symbol check_call(_LIB.MXGenBackendSubgraph(self.handle, c_str(backend), ctypes.byref(out))) File "../incubator-mxnet/python/mxnet/base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [14:02:47] /home/usrname/incubator-mxnet/src/operator/subgraph/build_subgraph.cc:258: Check failed: excluded_node_id != static_cast<int>(snid) (152 vs. 152) : A cycle is found in the computational graph between nodes maskrcnn0_rpn0_bboxcornertocenter0__plus1 and maskrcnn0_rpn0_bboxcornertocenter0__plus1 Stack trace: [bt] (0) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7f9485773f47] [bt] (1) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::LabelSubgraph(nnvm::Graph const&, std::shared_ptr<mxnet::op::SubgraphSelectorV2>, int, unsigned long, std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> >*, std::unordered_set<mxnet::op::BiDirectedNode const*, std::hash<mxnet::op::BiDirectedNode const*>, std::equal_to<mxnet::op::BiDirectedNode const*>, std::allocator<mxnet::op::BiDirectedNode const*> >*)+0x1fa3) [0x7f9486a466b3] [bt] (2) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::PreSelectSubgraphNodes(nnvm::Graph const&, std::shared_ptr<mxnet::op::SubgraphSelectorV2>, int, unsigned long, std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> >*)+0x18a) [0x7f9486a4798a] [bt] (3) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::SelectSubgraphNodes(nnvm::Graph*, std::shared_ptr<mxnet::op::SubgraphSelectorV2>, std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, std::vector<std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> >, std::allocator<std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> > > >*, std::vector<std::shared_ptr<mxnet::op::SubgraphSelectorV2>, std::allocator<std::shared_ptr<mxnet::op::SubgraphSelectorV2> > >*, mxnet::op::BiDirectedNode const*, unsigned long, unsigned long*)+0x15e) [0x7f9486a4847e] [bt] (4) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::sg::FindSubgraphs(nnvm::Graph*, mxnet::op::SubgraphProperty const&, std::vector<std::shared_ptr<mxnet::op::BiDirectedNode>, std::allocator<std::shared_ptr<mxnet::op::BiDirectedNode> > > const&, std::vector<std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> >, std::allocator<std::vector<mxnet::op::BiDirectedNode*, std::allocator<mxnet::op::BiDirectedNode*> > > >*, std::vector<std::shared_ptr<mxnet::op::SubgraphSelectorV2>, std::allocator<std::shared_ptr<mxnet::op::SubgraphSelectorV2> > >*)+0x49a) [0x7f9486a48ffa] [bt] (5) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::BuildSubgraph(nnvm::Graph&&)+0x3c7) [0x7f9486a4c3a7] [bt] (6) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<nnvm::Graph (nnvm::Graph), nnvm::Graph (*)(nnvm::Graph&&)>::_M_invoke(std::_Any_data const&, nnvm::Graph&&)+0x29) [0x7f9485c4bcd9] [bt] (7) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::ApplyPasses(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x43a) [0x7f9488018b1a] [bt] (8) /home/usrname/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::ApplyPass(nnvm::Graph, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x150) [0x7f94857fb4b0] ``` ## Minimum reproducible example Test script uses tests/python/tensorrt/test_resnet18.py. though the original script runs ok. I use master branch of gluoncv to load pretrained models. ## What have you tried to solve it? https://discuss.mxnet.io/t/tensorrt-test-letnet5-onnox-parser-error/4508/3
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
