stephenrawls edited a comment on issue #15278: fixing var-seq-len rnn 
backward() operator
URL: https://github.com/apache/incubator-mxnet/pull/15278#issuecomment-503774511
 
 
   Just to keep the ticket updated:
   
   I have confirmed the following facts:
   
   1. If I set each sequence_length entry to the maximum sequence length, then 
my gradients between the reference net and the var-seq-len net do match
   2. When I set cudnn debugging on, I *am* calling the appropriate "unpacked 
enabled" version of the cudnn api and the appropriate seq-len values are passed 
in.
   i.e. I set:
   ```
   export CUDNN_LOGINFO_DBG=1
   export CUDNN_LOGDEST_DBG=/home/ec2-user/cudnn.dbg.log
   ```
   And I look at the resulting output and see:
   
   ```
   I! CuDNN (v7501) function cudnnRNNForwardTrainingEx() called:
   ...
   paddingMode: type=cudnnRNNPaddingMode_t; val=CUDNN_RNN_PADDED_IO_ENABLED (1);
   ...
   i!         seqLengthArray: type=int; val=[10,7,10,11,8,3,5,11,6,2];
   
   ```
   And this does match up to the corresponding call to the backward functions, 
i.e.
   ```
   I! CuDNN (v7501) function cudnnRNNBackwardDataEx() called:
   ...
   paddingMode: type=cudnnRNNPaddingMode_t; val=CUDNN_RNN_PADDED_IO_ENABLED (1);
   ...
   seqLengthArray: type=int; val=[10,7,10,11,8,3,5,11,6,2];
   ```
   And same for cudnnRNNBackwardWeightsEx().
   
   My suspicion now is maybe the reference net gradient is losing floating 
point precision because it is going through extra reverse / concat / etc 
operations. Going to consider another way of constructing the reference net for 
testing the gradient.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to