chandana1332 opened a new issue #15025: Gluon DataLoader incorrectly terminates the process pool in 1.4 URL: https://github.com/apache/incubator-mxnet/issues/15025 Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form. For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io ## Description Gluon DataLoader terminates the process pool early while _MultiWorkerIter is operating on the pool. Cause: https://github.com/apache/incubator-mxnet/pull/13537/files As seen in the patch, the process pool is terminated when DataLoader is garbage collected but the scope of the process pool goes beyond the DataLoader until _MultiWorkerIter ## Environment info (Required) ``` ----------Python Info---------- Version : 3.7.3 Compiler : Clang 9.0.0 (clang-900.0.39.2) Build : ('default', 'Mar 27 2019 09:23:32') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.0.3 Directory : /usr/local/lib/python3.7/site-packages/pip ----------MXNet Info----------- ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' Version : 1.4.0 Directory : /usr/local/lib/python3.7/site-packages/mxnet Commit Hash : a03d59ed867ba334d78d61246a1090cd1868f5da ----------System Info---------- Platform : Darwin-16.7.0-x86_64-i386-64bit system : Darwin node : 4c327592e11f.ant.amazon.com release : 16.7.0 version : Darwin Kernel Version 16.7.0: Wed Apr 24 20:50:53 PDT 2019; root:xnu-3789.73.49~1/RELEASE_X86_64 ----------Hardware Info---------- machine : x86_64 processor : i386 b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI' b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT FPU_CSDS MD_CLEAR IBRS STIBP L1DF SSBD' b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C' b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz' ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0241 sec, LOAD: 0.6201 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0415 sec, LOAD: 0.5400 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0237 sec, LOAD: 0.4507 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0229 sec, LOAD: 0.2132 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0175 sec, LOAD: 0.4395 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0216 sec, LOAD: 0.1202 sec. ``` I'm using Python 3.7 ## Error Message: File "/usr/local/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 435, in __next__ self._push_next() File "/usr/local/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 430, in _push_next self._worker_fn, (r, self._batchify_fn, self._dataset)) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 362, in apply_async raise ValueError("Pool not running") ValueError: Pool not running ## Minimum reproducible example dl = iter( gluon.data.DataLoader( dataset, batchify_fn=func, **self.loader_kwargs ) ) num_worker >0 ## What have you tried to solve it? 1. Commented the following in dataloader.py def __del__(self): if self._worker_pool: # manually terminate due to a bug that pool is not automatically terminated # https://bugs.python.org/issue34172 assert isinstance(self._worker_pool, multiprocessing.pool.Pool) self._worker_pool.terminate() 2. Use DataLoaderV1
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
