I would like to answer your questions in reverse order…
5. How Maximum entropy works ?
see A Maximum Entropy approach to NLP Berger, Della Pietra, Della Pietra. In
Journal of Computation Lingutistics 22:1 (just google it…)
In a nutshell, if you have no information all outcomes are equally likely.
Every training case (Berger calls these constraints) changes the probability of
an outcome.
4. What happenes during training?
(Assuming GIS training) Each predicate (feature, word) is assigned a weight to
each output (when it co-occurs with an output). Weights are assigned to
maximize the likelihood of correctly classifying a case
3. How a test case is classified ? for each predicate/output there is a
weight. For each predicate in your test case, the outcome with the highest
product of the weights is selected. Note that the output is normalized so that
the sum of all outputs is one.
2. I am guessing something like a running sum of the log (product of
weights for the predicates of the output for the training case output)/(product
of all the weight). You should check the code.
1. What is happening during each iteration [of the training] ? The weight are
initialize to a value of 0. Kind of useless ‘eh. So each interation improves
the values for the weights based on your training data. For more info.
Manning’s Foundations of Statistical Natural Language Processing has a good
description of GIS.
Hope that helps.
On 2/6/17, 12:38 PM, "Manoj B. Narayanan" <[email protected]>
wrote:
Hi,
I have been using Open NLP for a while now. I have been training models
with custom data along with predefined features as well as custom features.
Could someone explain me/ guide me to some documentation of what is
happening internally.
The thing I am particularly interested are :
1. What is happening during each iteration ?
2. How the log likelihood and probability is calculated at each step ?
3. How a test case is classified ?
4. What happens during training ?
5. How Maximum entropy works ?
Someone please guide me.
Thanks.
Manoj