NRauschmayr opened a new issue #14938: Performance issue when use_bias=True
URL: https://github.com/apache/incubator-mxnet/issues/14938
 
 
   
   ## Description
   I implemented a neural network that mainly consists of many Dense layers. 
The same network implemented in Pytorch runs significantly faster: 40s instead 
of 120s per epoch in MXNet. By setting use_bias=False in the Dense layers I got 
a significant performance speedup 35s instead of 120s per epoch. In the Pytorch 
implementation bias=True or False does not result in any significant 
performance difference.
   
   I created a small reproducible example:  
   ```
   import mxnet as mx
   from mxnet import gluon
   import numpy as np
   import time
   
   class net(mx.gluon.HybridBlock):
      def __init__(self):
           super(net, self).__init__()
           self.fr = gluon.nn.HybridSequential()
           self.fr.add(gluon.nn.Dense(50, activation='relu', flatten=False, 
use_bias=True))
          #self.fr.add(gluon.nn.Dense(50, activation='relu', flatten=False, 
use_bias=False))
           self.fr.add(gluon.nn.Dense(10))
           self.init_matrices()
   
      def init_matrices(self):
           a = np.diagflat(np.arange(0,10000))[0:100,:]
           shape = (100,10000)
           a_init = mx.init.Constant(a.tolist())
           self.a = self.params.get('a', shape=shape, 
allow_deferred_init=False, init=a_init)
   
      def hybrid_forward(self, F, x, a):
           x = F.dot(x,a)
           x = x.transpose((0, 2, 1))
           return self.fr(x)
   
   
   data = np.random.uniform(0,10000,(100, 20, 100))
   label = np.random.randint(0, 50, (100, 1))
   label = label.reshape(100,1)
   batch_size = 100
   ctx = mx.gpu()
   
   model = net()
   model.collect_params().initialize(mx.init.Xavier(), ctx=mx.gpu())
   optimizer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': 
1e-2})
   loss = gluon.loss.SoftmaxCrossEntropyLoss()
   
   model.hybridize(static_alloc=True, static_shape=True)
   data = mx.nd.array(data, ctx=ctx)
   label = mx.nd.array(label, ctx=ctx)
   
   tic = time.time()
   for i in range(100):
          with mx.autograd.record():
             out = model(data)
             l = loss(out, label)
          l.backward()
          optimizer.step(batch_size)
   mx.nd.waitall()
   print("Time: {}".format(time.time()-tic))
   ```
   When I run above example with MXNet version 1.5.0 on a p3-instance, then I 
get the following timings:
   with bias 2.5s
   without bias 1.1s
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to