Damiano Porta created OPENNLP-859:
-------------------------------------

             Summary: Cannot get entities from trained model using 
DictionaryFeatureGenerator 
                 Key: OPENNLP-859
                 URL: https://issues.apache.org/jira/browse/OPENNLP-859
             Project: OpenNLP
          Issue Type: Question
          Components: Name Finder
    Affects Versions: 1.6.0
         Environment: ubuntu 16.04 java 8
            Reporter: Damiano Porta


Hello,
I have created the following training data.

```
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john 
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico 
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a 
calcio .
```
And then this code:

```

        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =
                        new PlainTextByLineStream(new 
FileInputStream("/home/damiano/person.train"), charset);
        ObjectStream<NameSample> sampleStream = new 
NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        Dictionary dictionary = new Dictionary();
        dictionary.put(new StringList(new String[]{"giovanni"}));
        dictionary.put(new StringList(new String[]{"maria"}));
        dictionary.put(new StringList(new String[]{"luca"}));
      
        BufferedOutputStream aa = null;
          
        AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
                 new AdaptiveFeatureGenerator[]{                                
 
                    new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 
2),
                    new WindowFeatureGenerator(new 
TokenClassFeatureGenerator(true), 2, 2),
                    new OutcomePriorFeatureGenerator(),
                    new PreviousMapFeatureGenerator(),
                    new BigramNameFeatureGenerator(),
                    new SentenceFeatureGenerator(true, false),
                    new DictionaryFeatureGenerator("person", dictionary)
                   });

        try {
            model = NameFinderME.train("it", "person", sampleStream, 
TrainingParameters.defaultParams(),
                    featureGenerator, Collections.<String, Object>emptyMap());
        }
        finally {
          sampleStream.close();
        }

        // Save trained model
        try (BufferedOutputStream modelOut = new BufferedOutputStream(new 
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
          model.serialize(modelOut);
        }
                
        // Read the trained model
        try (InputStream modelIn = new 
FileInputStream("/home/damiano/it-person-custom.bin")) {

            TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);

            NameFinderME nameFinder = new NameFinderME(nerModel, 
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
          
            String sentence[] = new String[]{
                "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
"."
            };
            
            Span nameSpans[] = nameFinder.find(sentence);                     
          
            System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, 
sentence)));
        }      
```

When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."` 
it correctly detect "Damiano" as PERSON, but if i change it with:

"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."

it does not detect "maria" as PERSON but I added "maria" in the dictionary so 
it should get it. Why not ?

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to