johnbroughton2017 opened a new issue #9884: How to speed up prediction run 
time? Copying gpu->cpu takes a long time
URL: https://github.com/apache/incubator-mxnet/issues/9884
 
 
   Hi all, 
   
   Doing prediction using mxnet has two major part: forward pass and copy 
results from gpu to cpu memory, as
   ```
   mod.forward(Batch([mx.nd.array(data)]))
   prob = mod.get_outputs(0)[0][0].asnumpy()
   ```
   
   I did a quick timing based on batch size (see below). It seems like the 
second operation above takes a lot of time when batch size increases.
   
     batch size    mod.forward() (ms)    mod.get_outputs...asnumpy() (ms)
   
------------------------------------------------------------------------------------------------
             16                   5.8                                30.1
             32                  10.5                                51.1
             48                  14                                  78.7
             64                  17.8                                95.6
             80                  33.2                               121.3
             96                  36.2                               147.5
            112                  41.3                               174.3
            128                  46.4                               245.5
            144                  52                                 219
            160                  56.9                               241.2
            176                  64.9                               267.4
            192                  69.5                               329.1
            208                  73.4                               317.1
            224                  80.7                               337.4
            240                  83.4                               446.7
            256                  93.4                               380.7
   
   I don't understand this because copying data from gpu to cpu should be 
really fast. For example, the following code takes only 0.1ms to run.
   ```
   # speed test
   import time
   import mxnet as mx
   a = mx.nd.random_uniform(shape=(256, 3, 224, 224), ctx=mx.cpu())
   b = mx.nd.random_uniform(shape=(256, 3, 224, 224), ctx=mx.gpu())
   
   t0 = time.time()
   b.copyto(a)
   print time.time()-t0
   ```
   
   Am I doing this in a wrong way? Any help is highly appreciated. Thanks.
   
   -- John
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to