atiyo opened a new issue #7637: Strange Validation and Training Losses at epoch 
change
URL: https://github.com/apache/incubator-mxnet/issues/7637
 
 
   I struggled to get some mxnet models training to a good accuracy, so I took 
a closer look at training and validation losses of a toy model. I noticed some 
strange spikes between epochs, which surprised me.
   
   I expect I'm likely doing something wrong, but I can't see what: I have 
tried several optimisers, with learning rates spanning several orders of 
magnitude. It's most plausible that I'm doing something drastically wrong, 
being new to mxnet.
   
   Graphic below illustrating the phenomenon, and also code to reproduce figure:
   
   
![adam_loss](https://user-images.githubusercontent.com/12828061/29753519-35b2e6e2-8b6b-11e7-8c08-14b8730efceb.png)
   
   
   ```
   import mxnet as mx
   import numpy as np
   
   optimizer_choice = 'adam'
   learning_rate = 0.01
   batch_size = 500
   
   inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, size=10000), 
axis=1)
   labels = np.sin(inputs)
   
   eval_inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, 
size=10000), axis=1)
   eval_labels = np.sin(eval_inputs)
   
   data_iter = mx.io.NDArrayIter(data={'data':inputs}, label={'label':labels}, 
batch_size=batch_size, shuffle=True)
   eval_data_iter = mx.io.NDArrayIter(data={'data':eval_inputs}, 
label={'label':eval_labels}, batch_size=batch_size, shuffle=True)
   
   data = mx.sym.Variable('data')
   label = mx.sym.Variable('label')
   fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
   ac1 = mx.sym.Activation(data=fc1, act_type='relu')
   fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
   ac2 = mx.sym.Activation(data=fc2, act_type='relu')
   fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
   ac3 = mx.sym.Activation(data=fc3, act_type='relu')
   fc4 = mx.sym.FullyConnected(data=ac3, num_hidden=1)
   ac4 = mx.sym.Activation(data=fc4, act_type='tanh')
   loss = mx.symbol.LinearRegressionOutput(data=ac4, label=label)
   net = mx.module.Module(symbol=loss, data_names=['data'], 
label_names=['label'])
   
   train_error = [] 
   eval_error = []
   def log_error(period, log):
       def _callback(param):
           if param.nbatch % period == 0:
               name, value = param.eval_metric.get()
               log.append(value)
       return _callback
       
   optimizer_params={'learning_rate':learning_rate}
   net.fit(data_iter,
         optimizer=optimizer_choice,
         optimizer_params=optimizer_params,
         eval_data=eval_data_iter,
         eval_metric='mse',
         num_epoch=5,
         epoch_end_callback = mx.callback.do_checkpoint('test_net'),
         eval_batch_end_callback = log_error(1,eval_error),
         batch_end_callback = log_error(1,train_error)
         )
   
   train_error = np.array(train_error)
   eval_error = np.array(eval_error)
   import matplotlib.pyplot as plt
   plt.plot(np.arange(train_error.size),train_error, label = 'Training Error')
   plt.plot(np.arange(eval_error.size), eval_error, label = 'Validation Error')
   plt.legend(loc='upper right')
   plt.xlabel('Batch Number')
   plt.ylabel('Error')
   plt.title('Optimizer: {}. Learning Rate: 
{}'.format(optimizer_choice,learning_rate))
   plt.gca().set_ylim(bottom=0)
   plt.show()
   
   ```
   ## Environment info
   Operating System: macOS
   
   MXNet version: 0.11.0
   
   Python version and distribution: Python 2.7.13
   
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to