kohr-h opened a new issue #13040: Import and GPU init extremely slow on Windows
URL: https://github.com/apache/incubator-mxnet/issues/13040
 
 
   ## Description
   
   Under Windows (10), `import mxnet` takes about 10 seconds, and `nd.array([1, 
2, 3], ctx=gpu())` takes about 2 minutes. There might be a common reason.
   
   ## Environment info (Required)
   
   ```
   ----------Python Info----------
   Version      : 3.6.6
   Compiler     : MSC v.1900 64 bit (AMD64)
   Build        : ('default', 'Jun 28 2018 11:27:44')
   Arch         : ('64bit', 'WindowsPE')
   ------------Pip Info-----------
   Version      : 10.0.1
   Directory    : 
C:\Users\Holger\AppData\Local\conda\conda\envs\mxnet\lib\site-packages\pip
   ----------MXNet Info-----------
   Version      : 1.3.0
   Directory    : 
C:\Users\Holger\AppData\Local\conda\conda\envs\mxnet\lib\site-packages\mxnet
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Windows-10-10.0.17134-SP0
   system       : Windows
   node         : DESKTOP-3DBNGT7
   release      : 10
   version      : 10.0.17134
   ----------Hardware Info----------
   machine      : AMD64
   processor    : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
   Name
   Intel(R) Xeon(R) W-2175 CPU @ 2.50GHz
   
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0060 
sec, LOAD: 2.6945 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0402 sec, LOAD: 
0.7966 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0592 sec, LOAD: 
0.7766 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0387 sec, LOAD: 1.0609 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0176 sec, LOAD: 
0.5092 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0207 sec, 
LOAD: 0.0657 sec.
   ```
   
   ### Package used (Python/R/Scala/Julia)
   
   I've tested the following versions:
   - `pip install mxnet-cu92mkl`: See "Test code & output" below
   - `pip install --pre mxnet-cu92mkl` (installs 
`mxnet-cu92mkl-1.3.1b20180927`): Roughly same result
   - `pip install mxnet-cu92`: Roughly same result
   - `pip install mxnet-mkl`: Import time drops to **0.8 seconds**
   
   ## Test code & output
   
   ```python
   from contextlib import contextmanager
   from time import time
   
   @contextmanager
   def timeit(prefix):
       t = time()
       yield
       print("{}: {} seconds".format(prefix, time() - t))
   
   with timeit("import mxnet"):
       import mxnet
   
   with timeit("create GPU context"):
       ctx = mxnet.context.gpu()
   
   with timeit("create GPU array"):
       arr_gpu = mxnet.nd.array([1, 2, 3], ctx=ctx)
   
   with timeit("create second GPU array"):
       arr2_gpu = mxnet.nd.ones(3, ctx=ctx)
   
   with timeit("create third GPU array"):
       arr3_gpu = mxnet.nd.ones(3, ctx=ctx)
   
   with timeit("create fourth GPU array"):
       arr4_gpu = mxnet.nd.ones(3, ctx=ctx)
   
   with timeit("add GPU arrays"):
       sum_gpu = arr_gpu + arr2_gpu
   
   with timeit("transfer to CPU"):
       sum_cpu = sum_gpu.copyto(mxnet.context.cpu())
   
   with timeit("back to GPU"):
       sum_gpu_copy = sum_cpu.copyto(ctx)
   ```
   Output:
   ```
   import mxnet: 10.169919729232788 seconds
   create GPU context: 0.0 seconds
   create GPU array: 108.4664990901947 seconds
   create second GPU array: 2.2313592433929443 seconds
   create third GPU array: 0.0010004043579101562 seconds
   create fourth GPU array: 0.0010001659393310547 seconds
   add GPU arrays: 0.0 seconds
   transfer to CPU: 0.0009996891021728516 seconds
   back to GPU: 0.0 seconds
   ```
   
   ## More info about import time
   
   I've run `python -X importtime -c "import mxnet"` under Python 3.7, 
[here](https://gist.github.com/kohr-h/460df3d90a665f564cada50468ef6b3a) is the 
output.
   Clearly, the hotspot is the import of `mxnet.base`. There are not many other 
heavy packages loaded, so the slow part must be in `mxnet.base` itself. I'd 
guess that it has to do with the import of the DLLs, in particular the CUDA 
stuff.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to