NRauschmayr opened a new issue #14938: Performance issue when use_bias=True URL: https://github.com/apache/incubator-mxnet/issues/14938 ## Description I implemented a neural network that mainly consists of many Dense layers. The same network implemented in Pytorch runs significantly faster: 40s instead of 120s per epoch in MXNet. By setting use_bias=False in the Dense layers I got a significant performance speedup 35s instead of 120s per epoch. In the Pytorch implementation bias=True or False does not result in any significant performance difference. I created a small reproducible example: ``` import mxnet as mx from mxnet import gluon import numpy as np import time class net(mx.gluon.HybridBlock): def __init__(self): super(net, self).__init__() self.fr = gluon.nn.HybridSequential() self.fr.add(gluon.nn.Dense(50, activation='relu', flatten=False, use_bias=True)) #self.fr.add(gluon.nn.Dense(50, activation='relu', flatten=False, use_bias=False)) self.fr.add(gluon.nn.Dense(10)) self.init_matrices() def init_matrices(self): a = np.diagflat(np.arange(0,10000))[0:100,:] shape = (100,10000) a_init = mx.init.Constant(a.tolist()) self.a = self.params.get('a', shape=shape, allow_deferred_init=False, init=a_init) def hybrid_forward(self, F, x, a): x = F.dot(x,a) x = x.transpose((0, 2, 1)) return self.fr(x) data = np.random.uniform(0,10000,(100, 20, 100)) label = np.random.randint(0, 50, (100, 1)) label = label.reshape(100,1) batch_size = 100 ctx = mx.gpu() model = net() model.collect_params().initialize(mx.init.Xavier(), ctx=mx.gpu()) optimizer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': 1e-2}) loss = gluon.loss.SoftmaxCrossEntropyLoss() model.hybridize(static_alloc=True, static_shape=True) data = mx.nd.array(data, ctx=ctx) label = mx.nd.array(label, ctx=ctx) tic = time.time() for i in range(100): with mx.autograd.record(): out = model(data) l = loss(out, label) l.backward() optimizer.step(batch_size) mx.nd.waitall() print("Time: {}".format(time.time()-tic)) ``` When I run above example with MXNet version 1.5.0 on a p3-instance, then I get the following timings: with bias 2.5s without bias 1.1s
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
