nopattern opened a new issue #15482: mx2onnx error about batchnorm URL: https://github.com/apache/incubator-mxnet/issues/15482 ## Description I use mx2onnx onnx_mxnet.export_model to transfer mxnet symbol to onnx . But the moving_mean&moving_var param of Batchnorm is not in the params. So the ## Environment info (Required) ``` ----------Python Info---------- Version : 3.6.8 Compiler : GCC 5.4.0 20160609 Build : ('default', 'May 7 2019 14:58:50') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 19.1.1 Directory : /usr/local/lib/python3.6/dist-packages/pip ----------MXNet Info----------- Version : 1.5.0 Directory : /home/deep/workssd/mxnet/incubator-mxnet/python/mxnet Hashtag not found. Not installed from pre-built package. ----------System Info---------- Platform : Linux-4.4.0-148-generic-x86_64-with-Ubuntu-16.04-xenial system : Linux node : MS-7817 release : 4.4.0-148-generic version : #174-Ubuntu SMP Tue May 7 12:20:14 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz Stepping: 3 CPU MHz: 3657.070 CPU max MHz: 3700.0000 CPU min MHz: 800.0000 BogoMIPS: 6600.45 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 ``` Package used (Python/R/Scala/Julia): (I'm usining Python) ## Build info (Required if built from source) Compiler (gcc): MXNet commit hash: (da4b2a82511df) Build config: # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. #------------------------------------------------------------------------------- # Template configuration for compiling mxnet # # If you want to change the configuration, please use the following # steps. Assume you are on the root directory of mxnet. First copy the this # file so that any local changes will be ignored by git # # $ cp make/config.mk . # # Next modify the according entries, and then compile by # # $ make # # or build in parallel with 8 threads # # $ make -j8 #------------------------------------------------------------------------------- #--------------------- # choice of compiler #-------------------- ifndef CC export CC = gcc endif ifndef CXX export CXX = g++ endif ifndef NVCC export NVCC = nvcc endif # whether compile with options for MXNet developer DEV = 0 # whether compile with debug DEBUG = 0 # whether to turn on segfault signal handler to log the stack trace USE_SIGNAL_HANDLER = # the additional link flags you want to add ADD_LDFLAGS = # the additional compile flags you want to add ADD_CFLAGS = #--------------------------------------------- # matrix computation libraries for CPU/GPU #--------------------------------------------- # whether use CUDA during compile USE_CUDA = 1 # add the path to CUDA library to link and compile flag # if you have already add them to environment variable, leave it as NONE USE_CUDA_PATH = /usr/local/cuda #USE_CUDA_PATH = NONE # whether to enable CUDA runtime compilation ENABLE_CUDA_RTC = 1 # whether use CuDNN R3 library USE_CUDNN = 1 # whether to use NVTX when profiling USE_NVTX = 0 #whether to use NCCL library USE_NCCL = 0 #add the path to NCCL library USE_NCCL_PATH = NONE # whether use opencv during compilation # you can disable it, however, you will not able to use # imbin iterator USE_OPENCV = 1 # Add OpenCV include path, in which the directory `opencv2` exists USE_OPENCV_INC_PATH = NONE # Add OpenCV shared library path, in which the shared library exists USE_OPENCV_LIB_PATH = NONE #whether use libjpeg-turbo for image decode without OpenCV wrapper USE_LIBJPEG_TURBO = 0 #add the path to libjpeg-turbo library USE_LIBJPEG_TURBO_PATH = NONE # use openmp for parallelization USE_OPENMP = 1 # whether use MKL-DNN library: 0 = disabled, 1 = enabled # if USE_MKLDNN is not defined, MKL-DNN will be enabled by default on x86 Linux. # you can disable it explicity with USE_MKLDNN = 0 USE_MKLDNN = 0 # whether use NNPACK library USE_NNPACK = 0 # choose the version of blas you want to use # can be: mkl, blas, atlas, openblas # in default use atlas for linux while apple for osx UNAME_S := $(shell uname -s) ifeq ($(UNAME_S), Darwin) USE_BLAS = apple else USE_BLAS = atlas endif # whether use lapack during compilation # only effective when compiled with blas versions openblas/apple/atlas/mkl USE_LAPACK = 1 # path to lapack library in case of a non-standard installation USE_LAPACK_PATH = # add path to intel library, you may need it for MKL, if you did not add the path # to environment variable USE_INTEL_PATH = NONE # If use MKL only for BLAS, choose static link automatically to allow python wrapper ifeq ($(USE_BLAS), mkl) USE_STATIC_MKL = 1 else USE_STATIC_MKL = NONE endif #---------------------------- # Settings for power and arm arch #---------------------------- ARCH := $(shell uname -a) ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64)) USE_SSE=0 USE_F16C=0 else USE_SSE=1 endif #---------------------------- # F16C instruction support for faster arithmetic of fp16 on CPU #---------------------------- # For distributed training with fp16, this helps even if training on GPUs # If left empty, checks CPU support and turns it on. # For cross compilation, please check support for F16C on target device and turn off if necessary. USE_F16C = #---------------------------- # distributed computing #---------------------------- # whether or not to enable multi-machine supporting USE_DIST_KVSTORE = 0 # whether or not allow to read and write HDFS directly. If yes, then hadoop is # required USE_HDFS = 0 # path to libjvm.so. required if USE_HDFS=1 LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server # whether or not allow to read and write AWS S3 directly. If yes, then # libcurl4-openssl-dev is required, it can be installed on Ubuntu by # sudo apt-get install -y libcurl4-openssl-dev USE_S3 = 0 #---------------------------- # performance settings #---------------------------- # Use operator tuning USE_OPERATOR_TUNING = 1 # Use gperftools if found # Disable because of #8968 USE_GPERFTOOLS = 0 # path to gperftools (tcmalloc) library in case of a non-standard installation USE_GPERFTOOLS_PATH = # Link gperftools statically USE_GPERFTOOLS_STATIC = # Use JEMalloc if found, and not using gperftools USE_JEMALLOC = 1 # path to jemalloc library in case of a non-standard installation USE_JEMALLOC_PATH = # Link jemalloc statically USE_JEMALLOC_STATIC = #---------------------------- # additional operators #---------------------------- # path to folders containing projects specific operators that you don't want to put in src/operators EXTRA_OPERATORS = #---------------------------- # other features #---------------------------- # Create C++ interface package USE_CPP_PACKAGE = 0 # Use int64_t type to represent the total number of elements in a tensor # This will cause performance degradation reported in issue #14496 # Set to 1 for large tensor with tensor size greater than INT32_MAX i.e. 2147483647 # Note: the size of each dimension is still bounded by INT32_MAX USE_INT64_TENSOR_SIZE = 0 # Python executable. Needed for cython target PYTHON = python #---------------------------- # plugins #---------------------------- # whether to use caffe integration. This requires installing caffe. # You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH # CAFFE_PATH = $(HOME)/caffe # MXNET_PLUGINS += plugin/caffe/caffe.mk #WARPCTC_PATH = $(HOME)/warp-ctc WARPCTC_PATH = /home/deep/warp-ctc MXNET_PLUGINS += plugin/warpctc/warpctc.mk # whether to use sframe integration. This requires build sframe # [email protected]:dato-code/SFrame.git # SFRAME_PATH = $(HOME)/SFrame # MXNET_PLUGINS += plugin/sframe/plugin.mk ## Error Message: INFO:root:Converting idx: 0, op: null, name: data INFO:root:Converting idx: 1, op: null, name: first-3x3-conv-conv2d_weight INFO:root:Converting idx: 2, op: Convolution, name: first-3x3-conv-conv2d INFO:root:Converting idx: 3, op: null, name: first-3x3-conv-batchnorm_gamma INFO:root:Converting idx: 4, op: null, name: first-3x3-conv-batchnorm_beta INFO:root:Converting idx: 5, op: null, name: first-3x3-conv-batchnorm_moving_mean Traceback (most recent call last): File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in <module> tune_and_evaluate(tuning_option) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate net, params, input_shape, _ = get_network(network, batch_size=1) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network return get_network_lpr_mb2(name,batch_size) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2 test_onnx() File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model verbose=verbose) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 256, in create_onnx_graph_proto idx=idx File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 92, in convert_layer return convert_func(node, **kwargs) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py", line 170, in convert_weights_and_inputs np_arr = weights[name] KeyError: 'first-3x3-conv-batchnorm_moving_mean' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module> from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module> import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module> from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module> import apt File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module> import apt_pkg ModuleNotFoundError: No module named 'apt_pkg' Original exception was: Traceback (most recent call last): File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in <module> tune_and_evaluate(tuning_option) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate net, params, input_shape, _ = get_network(network, batch_size=1) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network return get_network_lpr_mb2(name,batch_size) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2 test_onnx() File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model verbose=verbose) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 256, in create_onnx_graph_proto idx=idx File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 92, in convert_layer return convert_func(node, **kwargs) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py", line 170, in convert_weights_and_inputs np_arr = weights[name] KeyError: 'first-3x3-conv-batchnorm_moving_mean' ## Minimum reproducible example `batch_size = 1 input_shape = (batch_size, 3, 512, 512) output_shape = (batch_size, 65520,14) mx_sym, args,auxs = mx.model.load_checkpoint('./model/ssd_mobilenetv2_512', 18) mx_sym = get_symbol('mobilenetv2',512, num_classes=1,nms_thresh=0.5, force_nms=True, nms_topk=400) onnx_file = './mxnet_exported_resnet18.onnx' converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True)` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1.python3 tran2onnx.py 2. ## What have you tried to solve it? 1.By debugging ,the moving_mean&moving_var of batchnorm is not in params ,so the converter treat it as input which is not real. 2. There should be code to process the moving_mean&moving_var of batchnorm indepently.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
