qiliux opened a new issue #8100: mxnet cuda9.0 and cudnn 7 GPU slower than CPU URL: https://github.com/apache/incubator-mxnet/issues/8100 For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you. ## Environment info Operating System: Linux cbw-server 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9 09:19:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Compiler: Package used (Python/R/Scala/Julia): Python MXNet version: **mxnet-cu8.0** Or if installed from source: MXNet commit hash (`git rev-parse HEAD`): If you are using python package, please provide Python version and distribution: Python 3.6.2 :: Anaconda, Inc. If you are using R package, please provide R `sessionInfo()`: ## Error Message: Please paste the full error message, including stack trace. ## Minimum reproducible example if you are using your own code, please provide a short script that reproduces the error. ## Steps to reproduce or if you are running standard examples, please provide the commands you have run that lead to the error. 1. Use the tutorial in the mxnet straight dope: chapter02 - liner-regression-gluon 2. change ctx to GPU ``` ctx = mx.cpu() # change cpu to gpu ctx = mx.gpu() import mxnet as mx import time import mxnet.ndarray as nd from mxnet import autograd, gluon ctx = mx.gpu(0) num_inputs = 2 num_outputs = 1 num_examples = 100000 def real_fn(X): return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2 X = nd.random_normal(shape=(num_examples,num_inputs),ctx=ctx) noise = .01 * nd.random_normal(shape=(num_examples),ctx=ctx) y = real_fn(X) + noise batch_size = 10000 train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True) net = gluon.nn.Sequential() with net.name_scope(): net.add(gluon.nn.Dense(1, in_units=2)) net.collect_params().initialize(mx.init.Normal(sigma=1.),ctx = ctx) square_loss = gluon.loss.L2Loss() trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1}) tic = time.time() epochs = 2 smoothing_constant = .01 for e in range(epochs): for i , (data,label) in enumerate(train_data): data = data.as_in_context(ctx) label = label.as_in_context(ctx) with autograd.record(): output = net(data) loss = square_loss(output, label) loss.backward() trainer.step(batch_size) curr_loss = nd.mean(loss).asscalar() moving_loss = (curr_loss if ((i == 0) and (e == 0)) else (1 - smoothing_constant) * moving_loss + (smoothing_constant)*curr_loss) print("Epoch %s. Moving avg of MSE: %s" %(e, moving_loss)) toc = time.time() print("GPU time: %s ms" % (1000*(toc-tic))) ``` 3. the GPU training time is slower than CPU time ## What have you tried to solve it? 1. Is it the reason that i install cuda 9.0 and cudnn 7.0 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
