[GitHub] daveliepmann opened a new issue #14239: Clojure RNN example does not match performance of pretrained model

GitBox Fri, 22 Feb 2019 14:04:23 -0800

daveliepmann opened a new issue #14239: Clojure RNN example does not match 
performance of pretrained model
URL: https://github.com/apache/incubator-mxnet/issues/14239
 
 
   ## Description
   The Clojure RNN/LSTM example does not match the performance expected (or 
seen in the pre-trained model) when trained for a matching 75 training epochs.
   
   ## Environment info (Required)
   Apologies, I don't have this info, as the test was run on an on-demand AWS 
machine. (p2.xlarge, AWS Deep Learning AMI for Ubuntu)
   
   I'm using the Clojure contrib package.
   
   ## Error Message:
   N/A
   
   ## Minimum reproducible example
   
https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package/examples/rnn
   
   ## Steps to reproduce
   
   1. Run the RNN example that ships with MXNet, but for 75 epochs just like 
the pretrained model, instead of the 2 used for demonstration
   2. Evaluate the output
   
   The pre-trained model, reportedly trained for 75 epochs on the Obama 
RNN/LSTM corpus using either the Python or Scala codebase, gives results like:
   >[The joke] of them war that this country dream. The American people can 
require medical bills and support for a few good service strong-skids.Meping 
prommastard edemach and John McCain. This
   
   I ran the Clojure example out of the box, changing only the number of 
epochs, and got results like this:
   >[The joke] thiptolty an whiend iomes funhilurld blonde ursk, wer orl orot. 
taced.MOuenckses t trora te ththay tioomones cato patorgilor dr isngr irelthes 
bey omoved. De sletor t, omesitorurieme time ro
   
   From [asking about this in 
Slack](https://the-asf.slack.com/archives/CEE9X9WN7/p1547638506024500) we 
identified two possible causes:
   
    - the Clojure example _may_ have different hyperparameters than what the 
pre-trained model was trained on: for instance, the Scala tutorial 
[docs](https://mxnet.incubator.apache.org/tutorials/scala/char_lstm.html) say 
the learning rate is 0.001 (not 0.01) and weight decay is 0.00001 (not 0.0001) 
However, the current Scala source uses the same values that the Clojure 
examples have. It's unclear what codebase was used to create the pretrained 
model, so it's possible there are other differences.
    - [per @gigasquid / 
Carin](https://the-asf.slack.com/archives/CEE9X9WN7/p1547661974032400), the 
BucketIterator was not ported over from Scala:
      ```
      ;;; in the case of this fixed bucketing that only uses one bucket size - 
it is the equivalent of padpadding all sentences to a fixed length.
      ;; we are going to use ndarray-iter for this
      ;; converting the bucketing-iter over to use is todo. We could either 
push for the example Scala one to be included in the base package and interop 
with that (which would be nice for other rnn needs too) or hand convert it over 
ourselves
      ```
      
([source](https://github.com/apache/incubator-mxnet/blob/master/contrib/clojure-package/examples/rnn/src/rnn/train_char_rnn.clj#L70))
   
   
   ## What have you tried to solve it?
   
   1. Ran a replication
   2. Asked in Slack


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] daveliepmann opened a new issue #14239: Clojure RNN example does not match performance of pretrained model

Reply via email to