I'm sure the others on this list can give you a more complete answer so I will try to not lead you astray.
The WindowFeatureGenerator is only one of the available feature generators. There are many classes that implement the AdaptiveFeatureGenerator interface [1] and you can, of course, provide your own implementation of that interface to support additional features. For example, the SentenceFeatureGenerator [2] looks at the beginning and end of each training sentence. So to answer your question, the length of the training sentence should not matter - what matters is if the combination of configured feature generators used can provide a model that accurately describes the training text. Jeff [1] https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/AdaptiveFeatureGenerator.html [2] https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html On Sun, May 1, 2016 at 12:02 PM, Damiano Porta <[email protected]> wrote: > Hi Jeff! > Thank you so much for your fast reply. > > I have a doubt, let suppose we use this feature with a window of: > > 2 tokens on the left + *ENTITY* + 2 tokens on the right > > The doubt is how can i train the model correctly? > > if only the previous 2 tokens and the next 2 tokens matters i should not > use long sentences to training the model. Right? > > For example (person-model.train): > > 1. I am <START:person> Barack <END> and I am the president of USA > > 2. My name is <START:person> Barack <END> and my surname is Obama > > ... > > Those are two stupid training samples, it is just to let you know my doubt. > > In this case i should have: > > *I am Barack and I* > > *name is Barack and my* > > the others tokens (left and right) do not matter. So the sentences on my > training set should be very short, right? Basically I should only define > all the "combinations" of the previous/next 2 tokens, right? > > Thank you! > Damiano > > > > 2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <[email protected]>: > > > I think you are looking for the WindowFeatureGenerator [1]. You can set > the > > size of the window by specifying the number of previous tokens and number > > of next tokens. > > > > Jeff > > > > [1] > > > > > https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html > > > > > > On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <[email protected]> > > wrote: > > > > > > Hello everybody > > > How many surrounding tokens are kept into account to find the entity > > using > > > a maxent model? > > > Basically a maxent model should detect an entity looking at the > > surronding > > > tokens, right ? > > > I would like to understand if: > > > > > > 1. can i set the number of tokens on the left side? > > > 2. can i set the number of tokens on the right side too ? > > > > > > Thank you in advance for the clarification > > > Best > > > > > > Damiano > > >
