chandana1332 opened a new issue #15025: Gluon DataLoader incorrectly terminates 
the process pool in 1.4
URL: https://github.com/apache/incubator-mxnet/issues/15025
 
 
   Note: Providing complete information in the most concise form is the best 
way to get help. This issue template serves as the checklist for essential 
information to most of the technical issues and bug reports. For non-technical 
issues and feature requests, feel free to present the information in what you 
believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at 
https://discuss.mxnet.io 
   
   ## Description
   Gluon DataLoader terminates the process pool early while _MultiWorkerIter is 
operating on the pool.
   Cause: https://github.com/apache/incubator-mxnet/pull/13537/files
   As seen in the patch, the process pool is terminated when DataLoader is 
garbage collected but the scope of the process pool goes beyond the DataLoader 
until _MultiWorkerIter
   
   
   ## Environment info (Required)
   
   ```
   ----------Python Info----------
   Version      : 3.7.3
   Compiler     : Clang 9.0.0 (clang-900.0.39.2)
   Build        : ('default', 'Mar 27 2019 09:23:32')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.0.3
   Directory    : /usr/local/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
   ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
   Version      : 1.4.0
   Directory    : /usr/local/lib/python3.7/site-packages/mxnet
   Commit Hash   : a03d59ed867ba334d78d61246a1090cd1868f5da
   ----------System Info----------
   Platform     : Darwin-16.7.0-x86_64-i386-64bit
   system       : Darwin
   node         : 4c327592e11f.ant.amazon.com
   release      : 16.7.0
   version      : Darwin Kernel Version 16.7.0: Wed Apr 24 20:50:53 PDT 2019; 
root:xnu-3789.73.49~1/RELEASE_X86_64
   ----------Hardware Info----------
   machine      : x86_64
   processor    : i386
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW 
RDTSCP TSCI'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 
BMI2 INVPCID SMAP RDSEED ADX IPT FPU_CSDS MD_CLEAR IBRS STIBP L1DF SSBD'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE 
MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ 
DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE 
POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz'
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0241 
sec, LOAD: 0.6201 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0415 sec, LOAD: 
0.5400 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0237 sec, LOAD: 
0.4507 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0229 sec, LOAD: 0.2132 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0175 sec, LOAD: 
0.4395 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0216 sec, 
LOAD: 0.1202 sec.
   ```
   
   I'm using Python 3.7
   
   ## Error Message:
   
     File 
"/usr/local/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 
435, in __next__
       self._push_next()
     File 
"/usr/local/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 
430, in _push_next
       self._worker_fn, (r, self._batchify_fn, self._dataset))
     File 
"/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py",
 line 362, in apply_async
       raise ValueError("Pool not running")
   ValueError: Pool not running
   
   ## Minimum reproducible example
   dl = iter(
                       gluon.data.DataLoader(
                           dataset,
                           batchify_fn=func,
                           **self.loader_kwargs
                       )
                   )
   
   num_worker >0
   
   ## What have you tried to solve it?
   
   1. Commented the following in dataloader.py
     def __del__(self):
           if self._worker_pool:
               # manually terminate due to a bug that pool is not automatically 
terminated
               # https://bugs.python.org/issue34172
               assert isinstance(self._worker_pool, multiprocessing.pool.Pool)
               self._worker_pool.terminate()
   
   2. Use DataLoaderV1
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to