CorneliusHagmeister opened a new issue #8959: Loss becomes nan when using correlation loss with MakeLoss on 2 images. URL: https://github.com/apache/incubator-mxnet/issues/8959 I am trying to use an image similarity as loss function for my network. For some reason though, the loss I get is always nan. I already reduced my network to a minimal example, yet I still get the error. ## Environment info python 3.4.5 mxnet 0.12.1 ``` ## Error Message: INFO:root:Epoch[0] Train-loss=nan INFO:root:Epoch[0] Time cost=0.759 INFO:root:Epoch[1] Train-loss=nan INFO:root:Epoch[1] Time cost=0.785 INFO:root:Epoch[2] Train-loss=nan INFO:root:Epoch[2] Time cost=0.798 INFO:root:Epoch[3] Train-loss=nan INFO:root:Epoch[3] Time cost=0.763 INFO:root:Epoch[4] Train-loss=nan INFO:root:Epoch[4] Time cost=0.773 INFO:root:Epoch[5] Train-loss=nan INFO:root:Epoch[5] Time cost=0.887 INFO:root:Epoch[6] Train-loss=nan INFO:root:Epoch[6] Time cost=0.917 INFO:root:Epoch[7] Train-loss=nan INFO:root:Epoch[7] Time cost=0.801 INFO:root:Epoch[8] Train-loss=nan INFO:root:Epoch[8] Time cost=0.860 INFO:root:Epoch[9] Train-loss=nan INFO:root:Epoch[9] Time cost=1.064 ``` ## Minimum reproducible example ``` python def conv_net_regressor(image_shape, bn_mom=0.9): (nchannel, height, width) = image_shape # We have 2 data sources and concatenate them data_fixed = mx.sym.Variable(name='data_fixed') data_moving = mx.sym.Variable(name='data_moving') concat_data = mx.sym.concat(data_fixed,data_moving,dim=1) batched = mx.sym.BatchNorm(data=concat_data, fix_gamma=True, eps=2e-5, momentum=bn_mom, name='bn_data') body = mx.sym.Convolution(data=concat_data, num_filter=20, kernel=(3, 3), stride=(1, 1), pad=(0, 0), no_bias=True, name="conv" + str(0)) body = mx.sym.Activation(data=body, act_type='tanh', name='relu' + str(0)) fc2 = mx.sym.FullyConnected(data=body, num_hidden=10) #fc2= mx.sym.BlockGrad(fc2) cor = mx.sym.Correlation(data1=data_fixed, data2=data_moving) stnet = mx.sym.MakeLoss(cor, normalization='batch') return stnet def get_symbol(image_shape): return conv_net_regressor(image_shape) if __name__ == '__main__': mnist_shape = (1, 28, 28) iterators = get_mnist_data_iterator() net = get_symbol(mnist_shape) model = mx.mod.Module(symbol=net, context=mx.cpu(), label_names=None, data_names=['data_fixed', 'data_moving']) # a = mx.viz.plot_network(net) # a.render() model.fit(iterators[0], optimizer='sgd', optimizer_params={'learning_rate': 0.1}, eval_metric=mx.metric.Loss(), num_epoch=10) ``` I assume there is some mistake with my usage of MakeLoss or my iterator doesnt work the way I expect. Maybe someone has got an idea what's causing this. If you need more information let me know.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
