AssassinTee opened a new issue #15891: Timeout on second predict URL: https://github.com/apache/incubator-mxnet/issues/15891 ## Description I'm setting up a flask server which loads my mxnet model and has a predict-Api-method. While testing the api I noticed, that the prediction has a timeout on the second call in the mxnet api. By timeout I mean that python is stuck in one mxnet method and seems to run endless. I am using python-flask and mxnet v. 1.4.1. I tried upgrading to mxnet v. 1.5.0 but nothing chaned and the error persists. I already tried different implementations for the predict method (see below), but both timeout. When I switch to a keras backend, everything works fine, but I need to use mxnet. I used this guide for the forward-prediction: [https://mxnet.incubator.apache.org/versions/master/tutorials/python/predict_image.html](https://mxnet.incubator.apache.org/versions/master/tutorials/python/predict_image.html) ## Environment info (Required) ``` ----------Python Info---------- Version : 3.6.9 Compiler : GCC 7.3.0 Build : ('default', 'Jul 30 2019 19:07:31') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.1.1 Directory : /home/<removed>/anaconda3/envs/openapi_flask/lib/python3.6/site-packages/pip ----------MXNet Info----------- Version : 1.5.0 Directory : /home/<removed>/anaconda3/envs/openapi_flask/lib/python3.6/site-packages/mxnet Commit Hash : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f Library : ['/home/<removed>/anaconda3/envs/openapi_flask/lib/python3.6/site-packages/mxnet/libmxnet.so'] Build features: ✖ CUDA ✖ CUDNN ✖ NCCL ✖ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✔ CPU_SSE4_2 ✖ CPU_SSE4A ✔ CPU_AVX ✖ CPU_AVX2 ✖ OPENMP ✖ SSE ✔ F16C ✖ JEMALLOC ✖ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✖ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ----------System Info---------- Platform : Linux-4.15.0-52-generic-x86_64-with-debian-buster-sid system : Linux node : marvin-Latitude-5590 release : 4.15.0-52-generic version : #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architektur: x86_64 CPU Operationsmodus: 32-bit, 64-bit Byte-Reihenfolge: Little Endian CPU(s): 8 Liste der Online-CPU(s): 0-7 Thread(s) pro Kern: 2 Kern(e) pro Socket: 4 Sockel: 1 NUMA-Knoten: 1 Anbieterkennung: GenuineIntel Prozessorfamilie: 6 Modell: 142 Modellname: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz Stepping: 10 CPU MHz: 1197.995 Maximale Taktfrequenz der CPU: 3600,0000 Minimale Taktfrequenz der CPU: 400,0000 BogoMIPS: 3792.00 Virtualisierung: VT-x L1d Cache: 32K L1i Cache: 32K L2 Cache: 256K L3 Cache: 6144K NUMA-Knoten0 CPU(s): 0-7 Markierungen: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0061 sec, LOAD: 0.7382 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0363 sec, LOAD: 0.8136 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0551 sec, LOAD: 0.8109 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0602 sec, LOAD: 0.7776 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0186 sec, LOAD: 0.5304 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0203 sec, LOAD: 0.1319 sec. ``` Package used (Python/R/Scala/Julia): I'm using the pip version (1.4.1 or 1.5.0), both didn't work ## Build info (Required if built from source) python 3.6 (from command) with flask ## Error Message: Nothing, the program hangs itself. ## Minimum reproducible example ```python import mxnet as mx from collections import namedtuple Batch = namedtuple('Batch', ['data']) class MxnetBackend: def __init__(self): print("MXNet Version:", mx.__version__) self.sym, self.arg_params, self.aux_params = mx.model.load_checkpoint(prefix='models/kc_mxnet', epoch=0) self.mod = mx.mod.Module(symbol=self.sym, data_names=['/dense_1_input1'], context=mx.cpu(), label_names=None) self.mod.bind(for_training=False, data_shapes=[('/dense_1_input1', (1, 1, 512, 3010))], label_shapes=self.mod._label_shapes) self.mod.set_params(self.arg_params, self.aux_params, allow_missing=True) def predict(self, X):#this timeouts on second call """ gets ndarray returns ndarray """ X = mx.nd.array(X) self.mod.forward(Batch(X)) res = self.mod.get_outputs()[0].asnumpy() return res def predict2(self, X):#this timeouts on second call, too """ gets ndarray returns ndarray """ return self.mod.predict(X).asnumpy() ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. Use the backend to predict a label 2. Predict another label 3. Wait for something to happen ## What have you tried to solve it? 1. Use multiple implementations of the predictor (see method predict and predict2) 2. Update mxnet to latest version (from 1.4.1 to 1.5.0) 3. Ask Stackoverflow
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
