Hi Jeff! Thank you so much for your fast reply. I have a doubt, let suppose we use this feature with a window of:
2 tokens on the left + *ENTITY* + 2 tokens on the right The doubt is how can i train the model correctly? if only the previous 2 tokens and the next 2 tokens matters i should not use long sentences to training the model. Right? For example (person-model.train): 1. I am <START:person> Barack <END> and I am the president of USA 2. My name is <START:person> Barack <END> and my surname is Obama ... Those are two stupid training samples, it is just to let you know my doubt. In this case i should have: *I am Barack and I* *name is Barack and my* the others tokens (left and right) do not matter. So the sentences on my training set should be very short, right? Basically I should only define all the "combinations" of the previous/next 2 tokens, right? Thank you! Damiano 2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <[email protected]>: > I think you are looking for the WindowFeatureGenerator [1]. You can set the > size of the window by specifying the number of previous tokens and number > of next tokens. > > Jeff > > [1] > > https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html > > > On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <[email protected]> > wrote: > > > > Hello everybody > > How many surrounding tokens are kept into account to find the entity > using > > a maxent model? > > Basically a maxent model should detect an entity looking at the > surronding > > tokens, right ? > > I would like to understand if: > > > > 1. can i set the number of tokens on the left side? > > 2. can i set the number of tokens on the right side too ? > > > > Thank you in advance for the clarification > > Best > > > > Damiano >
