[
https://issues.apache.org/jira/browse/OPENNLP-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner closed OPENNLP-859.
----------------------------------
Resolution: Feedback Received
That question received feedback. The reporter did not reply since 2017.
Closing.
> Cannot get entities from trained model using DictionaryFeatureGenerator
> ------------------------------------------------------------------------
>
> Key: OPENNLP-859
> URL: https://issues.apache.org/jira/browse/OPENNLP-859
> Project: OpenNLP
> Issue Type: Question
> Components: Name Finder
> Affects Versions: 1.6.0
> Environment: ubuntu 16.04 java 8
> Reporter: Damiano Porta
> Priority: Major
>
> Hello,
> I have created the following training data.
> {code:title=train.txt|borderStyle=solid}
> Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma .
> il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma
> .
> il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john
> <END> .
> Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio
> amico .
> Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a
> calcio .
> {code}
> And then this code:
> {code:title=test.java|borderStyle=solid}
> Charset charset = Charset.forName("UTF-8");
> ObjectStream<String> lineStream =
> new PlainTextByLineStream(new
> FileInputStream("/home/damiano/person.train"), charset);
> ObjectStream<NameSample> sampleStream = new
> NameSampleDataStream(lineStream);
> TokenNameFinderModel model;
> Dictionary dictionary = new Dictionary();
> dictionary.put(new StringList(new String[]{"giovanni"}));
> dictionary.put(new StringList(new String[]{"maria"}));
> dictionary.put(new StringList(new String[]{"luca"}));
>
> BufferedOutputStream aa = null;
>
> AdaptiveFeatureGenerator featureGenerator = new
> CachedFeatureGenerator(
> new AdaptiveFeatureGenerator[]{
>
> new WindowFeatureGenerator(new TokenFeatureGenerator(),
> 2, 2),
> new WindowFeatureGenerator(new
> TokenClassFeatureGenerator(true), 2, 2),
> new OutcomePriorFeatureGenerator(),
> new PreviousMapFeatureGenerator(),
> new BigramNameFeatureGenerator(),
> new SentenceFeatureGenerator(true, false),
> new DictionaryFeatureGenerator("person", dictionary)
> });
> try {
> model = NameFinderME.train("it", "person", sampleStream,
> TrainingParameters.defaultParams(),
> featureGenerator, Collections.<String, Object>emptyMap());
> }
> finally {
> sampleStream.close();
> }
> // Save trained model
> try (BufferedOutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream("/home/damiano/it-person-custom.bin"))) {
> model.serialize(modelOut);
> }
>
> // Read the trained model
> try (InputStream modelIn = new
> FileInputStream("/home/damiano/it-person-custom.bin")) {
> TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
> NameFinderME nameFinder = new NameFinderME(nerModel,
> featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
>
> String sentence[] = new String[]{
> "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma",
> "."
> };
>
> Span nameSpans[] = nameFinder.find(sentence);
>
> System.out.println(Arrays.toString(Span.spansToStrings(nameSpans,
> sentence)));
> }
> {code}
> When i try
> {code}
> "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
> {code}
> it correctly detect "Damiano" as PERSON, but if i change it with:
> {code}
> "Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
> {code}
> it does not detect "maria" as PERSON but I added "maria" in the dictionary so
> it should get it. Why not ?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)