JlidiBorhen opened a new issue #17163: Performance deterioration when predicting in Python multithreading Thread URL: https://github.com/apache/incubator-mxnet/issues/17163 ## Description I'm working with the face detection model in this repository : https://github.com/YonghaoHe/A-Light-and-Fast-Face-Detector-for-Edge-Devices/ When running in main thread , there's no problems. But when I run prediction in a multithreading.Thread , I get inaccurate detections and the bounding box scale changes randomly especially when the face is moving. ### Error Message None ## To Reproduce class DetectionThread(threading.Thread): def __init__(self, parent, params): threading.Thread.__init__(self) print("Initializing detection thread...") #face detector self.parent = parent import mxnet as mx ctx = mx.gpu(0) from config_farm import configuration_10_320_20L_5scales_v2 as cfg symbol_file_path = 'symbol_farm/symbol_10_320_20L_5scales_v2_deploy.json' model_file_path = 'saved_model/configuration_10_320_20L_5scales_v2/train_10_320_20L_5scales_v2_iter_1000000.params' self.face_predictor = predict.Predict(mxnet=mx, symbol_file_path=symbol_file_path, model_file_path=model_file_path, ctx=ctx, receptive_field_list=cfg.param_receptive_field_list, receptive_field_stride=cfg.param_receptive_field_stride, bbox_small_list=cfg.param_bbox_small_list, bbox_large_list=cfg.param_bbox_large_list, receptive_field_center_start=cfg.param_receptive_field_center_start, num_output_scales=cfg.param_num_output_scales) def run(self): while self.parent.isTerminated() == False: unit = None while unit == None: unit = self.parent.getUnit(self) if unit == None: # No units available yet time.sleep(0.02) if self.parent.isTerminated(): break img = unit.getFrame() detection_img = img.copy() unit.release() bboxes, infer_time = self.face_predictor.predict(detection_img, resize_scale=0.5, score_threshold=0.6, top_k=10000, \ NMS_threshold=0.4, NMS_flag=True, skip_scale_branch_list=[]) self.parent.bNewDetection = True if bboxes != []: self.parent.setDetections(bboxes, unit.getTimeStamp()) time.sleep(0.02) if self.parent.isTerminated(): break ### Steps to reproduce (Paste the commands you ran that produced the error.) 1. 2. ## What have you tried to solve it? 1. I tried converting to onnx to see if performance issues are gone with TensorRT but it produced the same behavior 2.I tried multiprocessing (spawn) and the same happens ## Environment We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below: ``` curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python ----------Python Info---------- Version : 3.6.9 Compiler : GCC 8.3.0 Build : ('default', 'Nov 7 2019 10:44:02') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 9.0.1 Directory : /usr/lib/python3/dist-packages/pip ----------MXNet Info----------- Version : 1.5.1 Directory : /home/borhen/.local/lib/python3.6/site-packages/mxnet Num GPUs : 1 Commit Hash : c9818480680f84daa6e281a974ab263691302ba8 ----------System Info---------- Platform : Linux-4.18.5-041805-generic-x86_64-with-Ubuntu-18.04-bionic system : Linux node : borhen-PC release : 4.18.5-041805-generic version : #201808241320 SMP Fri Aug 24 13:22:12 UTC 2018 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz Stepping: 10 CPU MHz: 1893.932 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0652 sec, LOAD: 0.6872 sec. Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0007 sec, LOAD: 0.6807 sec. Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0842 sec, LOAD: 0.5989 sec. Timing for D2L: http://d2l.ai, DNS: 0.0831 sec, LOAD: 0.1689 sec. Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0716 sec, LOAD: 0.2154 sec. Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0835 sec, LOAD: 0.4329 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0817 sec, LOAD: 1.3659 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0725 sec, LOAD: 0.2549 sec. ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
