Hello William! Thanks!

Yes i know that i can use DictionaryNameFinder but i was studying how the
generators work and how they impact the names recognition.

If we use DictionaryFeatureGenerator each token will labeled with a
specific code i read :*w:dic* and the *token* (
https://github.com/apache/opennlp/blob/164331477b1cea0942dcf6f07714fd50d8e2687e/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/InSpanGenerator.java#L72-L73
)

In this case we only label the tokens with specific "codes" like other
features do, we are not saying THOSE tokens are entities. Right? So for
example:

*sentence:*

I am <START:person> John <END>

*training:*

I feature1 feature2

am feature1 featureN

John featureX + w:dic

so in this case the algorthm understands that "John" labeled with featureX
+ w:dic is an entity. We *must* add the entries of the dictionary inside
the training data otherwise the machine learning will not "associate"
*w:dic* to the entity. Right ?
More features we add more easier will be the classification.

I wrote it badly but hope it makes sense :)

Damiano


Il 17/Ago/2016 14:13, "William Colen" <william.co...@gmail.com> ha scritto:

> Features does not guarantee that the token will be marked as a NE. Its is
> like saying to the model that in the dictionary the token can be a NE, but
> of course it will be evaluated with other features.
> Remember it is machine learning. You can skip the machine learning using a
> DictionaryNameFinder.
>
> http://opennlp.apache.org/documentation/1.6.0/apidocs/
> opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html
>
> Regards
> William
>
> 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
>
> > Hello,
> >
> > pardon guys for all these questions but i am trying to study OpenNLP
> > deeply.
> > I write a simple code, you can see it here:
> > https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec
> > t%20%3D%20OPENNLP
> > I am trying to understand what the generators are and what is their job.
> > I know they add features on the tokens list, but what does it mean in
> > simple words? (just adding simple codes on each token?) because for
> example
> > i tried the DictionaryFeatureGenerator with a simple list of names but
> they
> > are not recognized when i use the NameFinderME( see the link on jira )
> >
> > How can i read those features after the find() ?
> >
> > Thank you so much!
> > Damiano
> >
>

Reply via email to