Damiano Porta created OPENNLP-859:
-------------------------------------
Summary: Cannot get entities from trained model using
DictionaryFeatureGenerator
Key: OPENNLP-859
URL: https://issues.apache.org/jira/browse/OPENNLP-859
Project: OpenNLP
Issue Type: Question
Components: Name Finder
Affects Versions: 1.6.0
Environment: ubuntu 16.04 java 8
Reporter: Damiano Porta
Hello,
I have created the following training data.
```
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a
calcio .
```
And then this code:
```
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
new PlainTextByLineStream(new
FileInputStream("/home/damiano/person.train"), charset);
ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);
TokenNameFinderModel model;
Dictionary dictionary = new Dictionary();
dictionary.put(new StringList(new String[]{"giovanni"}));
dictionary.put(new StringList(new String[]{"maria"}));
dictionary.put(new StringList(new String[]{"luca"}));
BufferedOutputStream aa = null;
AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
new AdaptiveFeatureGenerator[]{
new WindowFeatureGenerator(new TokenFeatureGenerator(), 2,
2),
new WindowFeatureGenerator(new
TokenClassFeatureGenerator(true), 2, 2),
new OutcomePriorFeatureGenerator(),
new PreviousMapFeatureGenerator(),
new BigramNameFeatureGenerator(),
new SentenceFeatureGenerator(true, false),
new DictionaryFeatureGenerator("person", dictionary)
});
try {
model = NameFinderME.train("it", "person", sampleStream,
TrainingParameters.defaultParams(),
featureGenerator, Collections.<String, Object>emptyMap());
}
finally {
sampleStream.close();
}
// Save trained model
try (BufferedOutputStream modelOut = new BufferedOutputStream(new
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
model.serialize(modelOut);
}
// Read the trained model
try (InputStream modelIn = new
FileInputStream("/home/damiano/it-person-custom.bin")) {
TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(nerModel,
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
String sentence[] = new String[]{
"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
System.out.println(Arrays.toString(Span.spansToStrings(nameSpans,
sentence)));
}
```
When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."`
it correctly detect "Damiano" as PERSON, but if i change it with:
"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
it does not detect "maria" as PERSON but I added "maria" in the dictionary so
it should get it. Why not ?
Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)