Hello William! Thanks! Yes i know that i can use DictionaryNameFinder but i was studying how the generators work and how they impact the names recognition.
If we use DictionaryFeatureGenerator each token will labeled with a specific code i read :*w:dic* and the *token* ( https://github.com/apache/opennlp/blob/164331477b1cea0942dcf6f07714fd50d8e2687e/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/InSpanGenerator.java#L72-L73 ) In this case we only label the tokens with specific "codes" like other features do, we are not saying THOSE tokens are entities. Right? So for example: *sentence:* I am <START:person> John <END> *training:* I feature1 feature2 am feature1 featureN John featureX + w:dic so in this case the algorthm understands that "John" labeled with featureX + w:dic is an entity. We *must* add the entries of the dictionary inside the training data otherwise the machine learning will not "associate" *w:dic* to the entity. Right ? More features we add more easier will be the classification. I wrote it badly but hope it makes sense :) Damiano Il 17/Ago/2016 14:13, "William Colen" <william.co...@gmail.com> ha scritto: > Features does not guarantee that the token will be marked as a NE. Its is > like saying to the model that in the dictionary the token can be a NE, but > of course it will be evaluated with other features. > Remember it is machine learning. You can skip the machine learning using a > DictionaryNameFinder. > > http://opennlp.apache.org/documentation/1.6.0/apidocs/ > opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html > > Regards > William > > 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hello, > > > > pardon guys for all these questions but i am trying to study OpenNLP > > deeply. > > I write a simple code, you can see it here: > > https://issues.apache.org/jira/browse/OPENNLP-859?jql=projec > > t%20%3D%20OPENNLP > > I am trying to understand what the generators are and what is their job. > > I know they add features on the tokens list, but what does it mean in > > simple words? (just adding simple codes on each token?) because for > example > > i tried the DictionaryFeatureGenerator with a simple list of names but > they > > are not recognized when i use the NameFinderME( see the link on jira ) > > > > How can i read those features after the find() ? > > > > Thank you so much! > > Damiano > > >