[
https://issues.apache.org/jira/browse/OPENNLP-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Damiano Porta updated OPENNLP-859:
----------------------------------
Description:
Hello,
I have created the following training data.
{code:title=train.txt|borderStyle=solid}
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a
calcio .
{code}
And then this code:
{code:title=test.java|borderStyle=solid}
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
new PlainTextByLineStream(new
FileInputStream("/home/damiano/person.train"), charset);
ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);
TokenNameFinderModel model;
Dictionary dictionary = new Dictionary();
dictionary.put(new StringList(new String[]{"giovanni"}));
dictionary.put(new StringList(new String[]{"maria"}));
dictionary.put(new StringList(new String[]{"luca"}));
BufferedOutputStream aa = null;
AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
new AdaptiveFeatureGenerator[]{
new WindowFeatureGenerator(new TokenFeatureGenerator(), 2,
2),
new WindowFeatureGenerator(new
TokenClassFeatureGenerator(true), 2, 2),
new OutcomePriorFeatureGenerator(),
new PreviousMapFeatureGenerator(),
new BigramNameFeatureGenerator(),
new SentenceFeatureGenerator(true, false),
new DictionaryFeatureGenerator("person", dictionary)
});
try {
model = NameFinderME.train("it", "person", sampleStream,
TrainingParameters.defaultParams(),
featureGenerator, Collections.<String, Object>emptyMap());
}
finally {
sampleStream.close();
}
// Save trained model
try (BufferedOutputStream modelOut = new BufferedOutputStream(new
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
model.serialize(modelOut);
}
// Read the trained model
try (InputStream modelIn = new
FileInputStream("/home/damiano/it-person-custom.bin")) {
TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(nerModel,
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
String sentence[] = new String[]{
"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
System.out.println(Arrays.toString(Span.spansToStrings(nameSpans,
sentence)));
}
{code}
When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."`
it correctly detect "Damiano" as PERSON, but if i change it with:
"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
it does not detect "maria" as PERSON but I added "maria" in the dictionary so
it should get it. Why not ?
Thanks!
was:
Hello,
I have created the following training data.
```
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a
calcio .
```
And then this code:
```
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
new PlainTextByLineStream(new
FileInputStream("/home/damiano/person.train"), charset);
ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);
TokenNameFinderModel model;
Dictionary dictionary = new Dictionary();
dictionary.put(new StringList(new String[]{"giovanni"}));
dictionary.put(new StringList(new String[]{"maria"}));
dictionary.put(new StringList(new String[]{"luca"}));
BufferedOutputStream aa = null;
AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
new AdaptiveFeatureGenerator[]{
new WindowFeatureGenerator(new TokenFeatureGenerator(), 2,
2),
new WindowFeatureGenerator(new
TokenClassFeatureGenerator(true), 2, 2),
new OutcomePriorFeatureGenerator(),
new PreviousMapFeatureGenerator(),
new BigramNameFeatureGenerator(),
new SentenceFeatureGenerator(true, false),
new DictionaryFeatureGenerator("person", dictionary)
});
try {
model = NameFinderME.train("it", "person", sampleStream,
TrainingParameters.defaultParams(),
featureGenerator, Collections.<String, Object>emptyMap());
}
finally {
sampleStream.close();
}
// Save trained model
try (BufferedOutputStream modelOut = new BufferedOutputStream(new
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
model.serialize(modelOut);
}
// Read the trained model
try (InputStream modelIn = new
FileInputStream("/home/damiano/it-person-custom.bin")) {
TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(nerModel,
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
String sentence[] = new String[]{
"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
System.out.println(Arrays.toString(Span.spansToStrings(nameSpans,
sentence)));
}
```
When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."`
it correctly detect "Damiano" as PERSON, but if i change it with:
"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
it does not detect "maria" as PERSON but I added "maria" in the dictionary so
it should get it. Why not ?
Thanks!
> Cannot get entities from trained model using DictionaryFeatureGenerator
> ------------------------------------------------------------------------
>
> Key: OPENNLP-859
> URL: https://issues.apache.org/jira/browse/OPENNLP-859
> Project: OpenNLP
> Issue Type: Question
> Components: Name Finder
> Affects Versions: 1.6.0
> Environment: ubuntu 16.04 java 8
> Reporter: Damiano Porta
>
> Hello,
> I have created the following training data.
> {code:title=train.txt|borderStyle=solid}
> Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma .
> il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma
> .
> il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john
> <END> .
> Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio
> amico .
> Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a
> calcio .
> {code}
> And then this code:
> {code:title=test.java|borderStyle=solid}
> Charset charset = Charset.forName("UTF-8");
> ObjectStream<String> lineStream =
> new PlainTextByLineStream(new
> FileInputStream("/home/damiano/person.train"), charset);
> ObjectStream<NameSample> sampleStream = new
> NameSampleDataStream(lineStream);
> TokenNameFinderModel model;
> Dictionary dictionary = new Dictionary();
> dictionary.put(new StringList(new String[]{"giovanni"}));
> dictionary.put(new StringList(new String[]{"maria"}));
> dictionary.put(new StringList(new String[]{"luca"}));
>
> BufferedOutputStream aa = null;
>
> AdaptiveFeatureGenerator featureGenerator = new
> CachedFeatureGenerator(
> new AdaptiveFeatureGenerator[]{
>
> new WindowFeatureGenerator(new TokenFeatureGenerator(),
> 2, 2),
> new WindowFeatureGenerator(new
> TokenClassFeatureGenerator(true), 2, 2),
> new OutcomePriorFeatureGenerator(),
> new PreviousMapFeatureGenerator(),
> new BigramNameFeatureGenerator(),
> new SentenceFeatureGenerator(true, false),
> new DictionaryFeatureGenerator("person", dictionary)
> });
> try {
> model = NameFinderME.train("it", "person", sampleStream,
> TrainingParameters.defaultParams(),
> featureGenerator, Collections.<String, Object>emptyMap());
> }
> finally {
> sampleStream.close();
> }
> // Save trained model
> try (BufferedOutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream("/home/damiano/it-person-custom.bin"))) {
> model.serialize(modelOut);
> }
>
> // Read the trained model
> try (InputStream modelIn = new
> FileInputStream("/home/damiano/it-person-custom.bin")) {
> TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
> NameFinderME nameFinder = new NameFinderME(nerModel,
> featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
>
> String sentence[] = new String[]{
> "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
> "."
> };
>
> Span nameSpans[] = nameFinder.find(sentence);
>
> System.out.println(Arrays.toString(Span.spansToStrings(nameSpans,
> sentence)));
> }
> {code}
> When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
> "."` it correctly detect "Damiano" as PERSON, but if i change it with:
> "Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
> it does not detect "maria" as PERSON but I added "maria" in the dictionary so
> it should get it. Why not ?
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)