anbrjohn opened a new issue #11352: CRF weights never updated in Bi-LSTM-CRF
URL: https://github.com/apache/incubator-mxnet/issues/11352
 
 
    When using `incubator-mxnet/example/gluon/lstm_crf.py`, CRF transition 
matrix weights are never updated during training, defeating the purpose of the 
CRF layer. Printing `model.transitions.data()` each epoch confirmed this.
   
   Compare these lines of MXNet 
[version](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/lstm_crf.py)
 and the PyTorch 
[reference](https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html):
   ```
   self.transitions = nd.random.normal(shape=(self.tagset_size, 
self.tagset_size)) # MXNet
   self.transitions = nn.Parameter(torch.randn(self.tagset_size, 
self.tagset_size)) # PyTorch
   ```
   
   I was able to solve this issue by changing the above line to:
   ```
   self.transitions = gluon.Parameter("crf_transition_matrix", 
       shape=(self.tagset_size, self.tagset_size))
   ```
   Making this change required adding .data() to all other references to 
`self.transitions` in the code, eg:
   ```
   self.transitions[next_tag].reshape((1, -1) # Before
   self.transitions.data()[next_tag].reshape((1, -1) # After
   ```
   and manually updating the parameter dictionary outside of the class before 
model initialization:
   ```
   model.params.update({'crf_transition_matrix':model.transitions}) # Added 
this line
   model.initialize(mx.init.Xavier(magnitude=2.24), ctx=mx.cpu())
   optimizer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': 
0.01, 'wd': 1e-4})
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to