karan6181 commented on issue #17485:
URL: 
https://github.com/apache/incubator-mxnet/issues/17485#issuecomment-764834200


   I did some analysis on different `batch_size` hyperparameter configurations:
   
   1. If `batch_size=1 per GPU`, then training plus validation (after every 
epoch) works without any issue
   2. If `batch_size=2 per GPU`, then training works (If we don't run 
validation at all)
   3. If `batch_size=2 per GPU`, then training works, but validation fails, 
irrespective of doing validation at every epoch or at the end of training. 
   4. If we save the model (model params) after training with `batch_size=1 per 
GPU` and then run validation separately by loading the same model params with 
`batch_size=1 per GPU` then it works, however, with `batch_size=2 per GPU`, it 
doesn't work with the same model params that was loaded.
   
   Note: Validation doesn't support multi-batch. Meaning it always runs with 1 
image per GPU irrespective of `batch_size` number which is provided by the user.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to