Re: Surronding tokens of the entity on MaxEnt models

Jeffrey Zemerick Sun, 01 May 2016 12:14:25 -0700

I'm sure the others on this list can give you a more complete answer so I
will try to not lead you astray.


The WindowFeatureGenerator is only one of the available feature generators.
There are many classes that implement the AdaptiveFeatureGenerator
interface [1] and you can, of course, provide your own implementation of
that interface to support additional features. For example, the
SentenceFeatureGenerator [2] looks at the beginning and end of each
training sentence. So to answer your question, the length of the training
sentence should not matter - what matters is if the combination of
configured feature generators used can provide a model that accurately
describes the training text.

Jeff

[1]
https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/AdaptiveFeatureGenerator.html
[2]
https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html


On Sun, May 1, 2016 at 12:02 PM, Damiano Porta <[email protected]>
wrote:

> Hi Jeff!
> Thank you so much for your fast reply.
>
> I have a doubt, let suppose we use this feature with a window of:
>
> 2 tokens on the left + *ENTITY* + 2 tokens on the right
>
> The doubt is how can i train the model correctly?
>
> if only the previous 2 tokens and the next 2 tokens matters i should not
> use long sentences to training the model. Right?
>
> For example (person-model.train):
>
> 1. I am <START:person> Barack <END> and I am the president of USA
>
> 2. My name is <START:person> Barack <END> and my surname is Obama
>
> ...
>
> Those are two stupid training samples, it is just to let you know my doubt.
>
> In this case i should have:
>
> *I am Barack and I*
>
> *name is Barack and my*
>
> the others tokens (left and right) do not matter. So the sentences on my
> training set should be very short, right? Basically I should only define
> all the "combinations" of the previous/next 2 tokens, right?
>
> Thank you!
> Damiano
>
>
>
> 2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <[email protected]>:
>
> > I think you are looking for the WindowFeatureGenerator [1]. You can set
> the
> > size of the window by specifying the number of previous tokens and number
> > of next tokens.
> >
> > Jeff
> >
> > [1]
> >
> >
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html
> >
> >
> > On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <[email protected]>
> > wrote:
> > >
> > > Hello everybody
> > > How many surrounding tokens are kept into account to find the entity
> > using
> > > a maxent model?
> > > Basically a maxent model should detect an entity looking at the
> > surronding
> > > tokens, right ?
> > > I would like to understand if:
> > >
> > > 1. can i set the number of tokens on the left side?
> > > 2. can i set the number of tokens on the right side too ?
> > >
> > > Thank you in advance for the clarification
> > > Best
> > >
> > > Damiano
> >
>

Re: Surronding tokens of the entity on MaxEnt models

Reply via email to