johnbroughton2017 opened a new issue #9884: How to speed up prediction run time? Copying gpu->cpu takes a long time URL: https://github.com/apache/incubator-mxnet/issues/9884 Hi all, Doing prediction using mxnet has two major part: forward pass and copy results from gpu to cpu memory, as ``` mod.forward(Batch([mx.nd.array(data)])) prob = mod.get_outputs(0)[0][0].asnumpy() ``` I did a quick timing based on batch size (see below). It seems like the second operation above takes a lot of time when batch size increases. batch size mod.forward() (ms) mod.get_outputs...asnumpy() (ms) ------------------------------------------------------------------------------------------------ 16 5.8 30.1 32 10.5 51.1 48 14 78.7 64 17.8 95.6 80 33.2 121.3 96 36.2 147.5 112 41.3 174.3 128 46.4 245.5 144 52 219 160 56.9 241.2 176 64.9 267.4 192 69.5 329.1 208 73.4 317.1 224 80.7 337.4 240 83.4 446.7 256 93.4 380.7 I don't understand this because copying data from gpu to cpu should be really fast. For example, the following code takes only 0.1ms to run. ``` # speed test import time import mxnet as mx a = mx.nd.random_uniform(shape=(256, 3, 224, 224), ctx=mx.cpu()) b = mx.nd.random_uniform(shape=(256, 3, 224, 224), ctx=mx.gpu()) t0 = time.time() b.copyto(a) print time.time()-t0 ``` Am I doing this in a wrong way? Any help is highly appreciated. Thanks. -- John
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services