Jorn,
At the moment i am using the command tool to train my ner model, but i am
getting this error:

*opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
-model /home/damiano/it-person-perceptron.bin -featuregen
/home/damiano/test.xml -sequenceCodec BIO -resources
/home/damiano/lavoro/java/Parser/src/main/resources/*

*Exception in thread "main"
opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError:
opennlp.tools.util.InvalidFormatException: No dictionary resource for key:
nations.dictionary*
at
opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:209)
at
opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(TokenNameFinderFactory.java:150)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241)
at
opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169)
at opennlp.tools.cmdline.CLI.main(CLI.java:244)
Caused by: opennlp.tools.util.InvalidFormatException: No dictionary
resource for key: nations.dict
at
opennlp.tools.util.featuregen.GeneratorFactory$DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251)
at
opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732)
at
opennlp.tools.util.featuregen.GeneratorFactory$AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
at
opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732)
at
opennlp.tools.util.featuregen.GeneratorFactory$CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172)
at
opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732)
at
opennlp.tools.util.featuregen.GeneratorFactory$AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
at
opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732)
at
opennlp.tools.util.featuregen.GeneratorFactory.create(GeneratorFactory.java:782)
at
opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:189)
... 4 more

As you can see the problem is "
No dictionary resource for key: nations.dictionary" because i also need to
add a dictionary inside my model.

I did these test:

*1. used the name nations.dictionary as resource name in my generators.xml
and <dictionary dict="nations.dictionary" prefix="nation" />*

*2.used the name nations.xml as resource name in my generators.xml and
<dictionary dict="nations.xml" prefix="nation" />*

*3.used the name nations.dict as resource name in my generators.xml and
<dictionary dict="nations.dict" prefix="nation" />*

for each test i also have renamed the dictionary fiile name inside my
-resource directory.

I had no luck.

How should i call a dictionary resource?

Thanks.



2017-06-07 16:20 GMT+02:00 Damiano Porta <[email protected]>:

> Hello Jorn,
> i confirm the error. Please take a look at the code below. It is a working
> example, you only need to edit the constants GENERATORS, POSTAGGER and
> SERIALIZED.
>
>
> *TEST FILE:*
>
> package com.damiano.trainer;
>
> import java.io.BufferedOutputStream;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.ArrayList;
> import java.util.HashMap;
> import java.util.List;
> import java.util.Map;
> import opennlp.tools.ml.perceptron.PerceptronTrainer;
> import opennlp.tools.namefind.BioCodec;
> import opennlp.tools.namefind.NameFinderME;
> import opennlp.tools.namefind.NameSample;
> import opennlp.tools.namefind.TokenNameFinderFactory;
> import opennlp.tools.namefind.TokenNameFinderModel;
> import opennlp.tools.postag.POSModel;
> import opennlp.tools.util.ObjectStream;
> import opennlp.tools.util.ObjectStreamUtils;
> import opennlp.tools.util.TrainingParameters;
> import org.apache.commons.io.IOUtils;
>
> public class Test {
>
>     private final String GENERATORS = "/home/damiano/test.xml";
>     private final String POSTAGGER = "/home/damiano/postagger.bin";
>     private final String SERIALIZED = "/home/damiano/serialized.bin";
>
>     public static void main(String[] args) throws IOException {
>         Test test = new Test();
>     }
>
>     public Test() throws IOException {
>
>         List<NameSample> labelled = new ArrayList<>();
>
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> JACOB <END>", false));
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> JACK <END>", false));
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> THOMAS <END>", false));
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> GEORGE <END>", false));
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> WILLIAM <END>", false));
>         labelled.add(NameSample.parse("This is a sentence <START:person>
> JAMES <END>", false));
>
>         TokenNameFinderFactory factory;
>
>         try (ObjectStream<NameSample> samples = 
> ObjectStreamUtils.createObjectStream(labelled))
> {
>             //HashMap<String, Object> map = new HashMap<>();
>
>             try (InputStream in = new FileInputStream(GENERATORS)) {
>
>                 // Resources
>                 Map<String, Object> map = new HashMap<>();
>
>                 // Pos Tagger
>                 map.put("postagger.bin", Test.loadPosTagger(POSTAGGER));
>
>
>                 // Factory
>                 factory = new TokenNameFinderFactory(
>                     IOUtils.toByteArray(in),
>                     map,
>                     new BioCodec()
>                 );
>
>                 try {
>
>                     TrainingParameters mlParams = new TrainingParameters();
>                     mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> PerceptronTrainer.PERCEPTRON_VALUE);
>                     mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> Integer.toString(300));
>                     mlParams.put(TrainingParameters.CUTOFF_PARAM,
> Integer.toString(0));
>
>                     TokenNameFinderModel model = NameFinderME.train("it",
> "person", samples, mlParams, factory);
>
>                     try (BufferedOutputStream modelOut = new
> BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
>                         model.serialize(modelOut);
>                     }
>
>                 } catch (Exception ex) {
>                     ex.printStackTrace();
>                 }
>
>             }
>         }
>     }
>
>     public static POSModel loadPosTagger (String modelName) {
>
>         try (InputStream modelIn = new FileInputStream(modelName)) {
>             POSModel model = new POSModel(modelIn);
>             return model;
>         }
>         catch (Exception ex) { ex.printStackTrace();  }
>
>         return null;
>     }
> }
>
> *GENERATORS:*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <generators>
>     <cache>
>         <generators>
>             <window prevLength="4" nextLength="2">
>                 <tokenclass />
>             </window>
>             <window prevLength="4" nextLength="2">
>                 <token />
>             </window>
>             <!-- Pos Tagger -->
>             <window prevLength="4" nextLength="2">
>                 <tokenpos model="postagger.bin" />
>             </window>
>             <definition />
>             <prevmap />
>             <bigram />
>             <sentence begin="true" end="false" />
>         </generators>
>     </cache>
> </generators>
>
>
> *OUTPUT (with error):*
>
>
> *Indexing events using cutoff of 0 Computing event counts...  done. 30
> events Indexing...  done.Collecting events... Done indexing.Incorporating
> indexed data for training...  done. Number of Event Tokens: 30    Number of
> Outcomes: 2  Number of Predicates: 144Computing model
> parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  . (30/30)
> 1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
> change in training set accuracy less than 1.0E-5Stats: (30/30)
> 1.0...done.Compressed 144 parameters to 621 outcome
> patternsjava.lang.IllegalStateException: Missing serializer for
> postagger.bin at
> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
> com.damiano.trainer.Test.<init>(Test.java:75) at
> com.damiano.trainer.Test.main(Test.java:31)*
>
> 2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>:
>
>> Hmm let me try again, yes i copied it badly, i think the names are
>> correct, i will give you a working example.
>>
>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>:
>>
>>> Ok, but are you sure you used matching names? The exception states
>>> it-pos-maxent.bin,
>>> which object did you map to it?
>>>
>>> Jörn
>>>
>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]>
>>> wrote:
>>>
>>> > Hi Jorn! Yes
>>> >
>>> >         <dependency>
>>> >             <groupId>org.apache.opennlp</groupId>
>>> >             <artifactId>opennlp-tools</artifactId>
>>> >             <version>1.8.0</version>
>>> >         </dependency>
>>> >
>>> > Do i need others dependencies too?
>>> >
>>> >
>>> >
>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>:
>>> >
>>> > > This should be working. Did you test with 1.8.0?
>>> > >
>>> > > Jörn
>>> > >
>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <
>>> [email protected]>
>>> > > wrote:
>>> > >
>>> > > > Hello,
>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
>>> > > >
>>> > > > <tokenpos model="postagger.bin" />
>>> > > >
>>> > > > during the training i add this model in the resources doing:
>>> > > >
>>> > > >         HashMap<String, Object> map = new HashMap<>();
>>> > > >         map.put("postagger.bin", myPostaggerModel);
>>> > > >
>>> > > >
>>> > > >          factory = new TokenNameFinderFactory(
>>> > > >                IOUtils.toByteArray(in),
>>> > > >                map,
>>> > > >                new BioCodec()
>>> > > >          );
>>> > > >
>>> > > > I get this error:
>>> > > >
>>> > > > java.lang.IllegalStateException: Missing serializer for
>>> > > it-pos-maxent.bin
>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:
>>> 589)
>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
>>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
>>> java.lang.IllegalStateExceptio
>>> > n:
>>> > > > Missing serializer for postagger.bin
>>> > > >
>>> > > > Do i have to change the extension of the file?
>>> > > >
>>> > > > Thanks
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to