indhub opened a new pull request #10437: [MXNET-171] Fix a bug that was causing 
training accuracy to be printed as nan sometimes
   ## Description ##
   Fix a bug (issue #4253) that was causing training accuracy to be printed as 
nan sometimes
   The "Epoch[x] Train-accuracy=xxx" line printed at the end of every epoch 
gives the impression that the accuracy metric is for the entire epoch. In 
reality, it is NOT.
   For example if an epoch consists of 101 batches and we were printing metrics 
every 10 batch because callback was created that way, what will be printed as 
'Train-accuracy=xxx' at the end of an epoch is actually just the accuracy from 
a single batch (101'st batch). Printing this as 'Epoch[x] Train-accuracy=xxx' 
is very misleading.
   Ideally we should remove this misleading print statement. But then, I'm sure 
there is a lot of existing scripts out there that look for this statement. We 
can't remove this without breaking those scripts. Since this will be a breaking 
change, let's do it in a major version change.
   For now, to avoid the nan error, we can avoid resetting the metrics when 
processing callback for the last batch. It will be reset at the beginning of 
the next epoch anyway.
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to 
the relevant [JIRA issue]( 
created (except PRs with tiny changes)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to