qiliux opened a new issue #8100: mxnet cuda9.0 and cudnn 7 GPU slower than CPU
URL: https://github.com/apache/incubator-mxnet/issues/8100
 
 
   For bugs or installation issues, please provide the following information.
   The more information you provide, the more likely people will be able to 
help you.
   
   ## Environment info
   Operating System:
   Linux cbw-server 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9 09:19:02 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
   Compiler:
   
   Package used (Python/R/Scala/Julia):
   Python
   MXNet version:
   **mxnet-cu8.0**
   Or if installed from source:
   
   MXNet commit hash (`git rev-parse HEAD`):
   
   If you are using python package, please provide
   
   Python version and distribution:
   Python 3.6.2 :: Anaconda, Inc.
   If you are using R package, please provide
   
   R `sessionInfo()`:
   
   ## Error Message:
   Please paste the full error message, including stack trace.
   
   ## Minimum reproducible example
   if you are using your own code, please provide a short script that 
reproduces the error.
   
   ## Steps to reproduce
   or if you are running standard examples, please provide the commands you 
have run that lead to the error.
   
   1. Use the tutorial in the mxnet straight dope: chapter02 - 
liner-regression-gluon
   2. change ctx to GPU
   ```
   ctx = mx.cpu()
   # change cpu to gpu
   ctx = mx.gpu()
   
   import mxnet as mx
   import time
   import mxnet.ndarray as nd
   from mxnet import autograd, gluon
   ctx = mx.gpu(0)
   num_inputs = 2
   num_outputs = 1
   num_examples = 100000
   
   def real_fn(X):
       return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
   X = nd.random_normal(shape=(num_examples,num_inputs),ctx=ctx)
   noise = .01 * nd.random_normal(shape=(num_examples),ctx=ctx)
   y = real_fn(X) + noise
   batch_size = 10000
   train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y), 
batch_size=batch_size, shuffle=True)
   net = gluon.nn.Sequential()
   with net.name_scope():
       net.add(gluon.nn.Dense(1, in_units=2))
   net.collect_params().initialize(mx.init.Normal(sigma=1.),ctx = ctx)
   square_loss = gluon.loss.L2Loss()
   trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
   tic = time.time()
   epochs = 2
   smoothing_constant = .01
   
   for e in range(epochs):
       for i , (data,label) in enumerate(train_data):
           data = data.as_in_context(ctx)
           label = label.as_in_context(ctx)
           with autograd.record():
               output = net(data)
               loss = square_loss(output, label)
           loss.backward()
           trainer.step(batch_size)
           
           curr_loss = nd.mean(loss).asscalar()
           moving_loss = (curr_loss if ((i == 0) and (e == 0))
                         else (1 - smoothing_constant) * moving_loss + 
(smoothing_constant)*curr_loss)
       print("Epoch %s. Moving avg of MSE: %s" %(e, moving_loss))
   toc = time.time()
   print("GPU time: %s ms" % (1000*(toc-tic)))
   ```
   3. the GPU training time is slower than CPU time
   
   ## What have you tried to solve it?
   
   1. Is it the reason that i install cuda 9.0 and cudnn 7.0
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to