Jorn, At the moment i am using the command tool to train my ner model, but i am getting this error:
*opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it -model /home/damiano/it-person-perceptron.bin -featuregen /home/damiano/test.xml -sequenceCodec BIO -resources /home/damiano/lavoro/java/Parser/src/main/resources/* *Exception in thread "main" opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError: opennlp.tools.util.InvalidFormatException: No dictionary resource for key: nations.dictionary* at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:209) at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(TokenNameFinderFactory.java:150) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241) at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169) at opennlp.tools.cmdline.CLI.main(CLI.java:244) Caused by: opennlp.tools.util.InvalidFormatException: No dictionary resource for key: nations.dict at opennlp.tools.util.featuregen.GeneratorFactory$DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251) at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732) at opennlp.tools.util.featuregen.GeneratorFactory$AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732) at opennlp.tools.util.featuregen.GeneratorFactory$CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172) at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732) at opennlp.tools.util.featuregen.GeneratorFactory$AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(GeneratorFactory.java:732) at opennlp.tools.util.featuregen.GeneratorFactory.create(GeneratorFactory.java:782) at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:189) ... 4 more As you can see the problem is " No dictionary resource for key: nations.dictionary" because i also need to add a dictionary inside my model. I did these test: *1. used the name nations.dictionary as resource name in my generators.xml and <dictionary dict="nations.dictionary" prefix="nation" />* *2.used the name nations.xml as resource name in my generators.xml and <dictionary dict="nations.xml" prefix="nation" />* *3.used the name nations.dict as resource name in my generators.xml and <dictionary dict="nations.dict" prefix="nation" />* for each test i also have renamed the dictionary fiile name inside my -resource directory. I had no luck. How should i call a dictionary resource? Thanks. 2017-06-07 16:20 GMT+02:00 Damiano Porta <[email protected]>: > Hello Jorn, > i confirm the error. Please take a look at the code below. It is a working > example, you only need to edit the constants GENERATORS, POSTAGGER and > SERIALIZED. > > > *TEST FILE:* > > package com.damiano.trainer; > > import java.io.BufferedOutputStream; > import java.io.FileInputStream; > import java.io.FileOutputStream; > import java.io.IOException; > import java.io.InputStream; > import java.util.ArrayList; > import java.util.HashMap; > import java.util.List; > import java.util.Map; > import opennlp.tools.ml.perceptron.PerceptronTrainer; > import opennlp.tools.namefind.BioCodec; > import opennlp.tools.namefind.NameFinderME; > import opennlp.tools.namefind.NameSample; > import opennlp.tools.namefind.TokenNameFinderFactory; > import opennlp.tools.namefind.TokenNameFinderModel; > import opennlp.tools.postag.POSModel; > import opennlp.tools.util.ObjectStream; > import opennlp.tools.util.ObjectStreamUtils; > import opennlp.tools.util.TrainingParameters; > import org.apache.commons.io.IOUtils; > > public class Test { > > private final String GENERATORS = "/home/damiano/test.xml"; > private final String POSTAGGER = "/home/damiano/postagger.bin"; > private final String SERIALIZED = "/home/damiano/serialized.bin"; > > public static void main(String[] args) throws IOException { > Test test = new Test(); > } > > public Test() throws IOException { > > List<NameSample> labelled = new ArrayList<>(); > > labelled.add(NameSample.parse("This is a sentence <START:person> > JACOB <END>", false)); > labelled.add(NameSample.parse("This is a sentence <START:person> > JACK <END>", false)); > labelled.add(NameSample.parse("This is a sentence <START:person> > THOMAS <END>", false)); > labelled.add(NameSample.parse("This is a sentence <START:person> > GEORGE <END>", false)); > labelled.add(NameSample.parse("This is a sentence <START:person> > WILLIAM <END>", false)); > labelled.add(NameSample.parse("This is a sentence <START:person> > JAMES <END>", false)); > > TokenNameFinderFactory factory; > > try (ObjectStream<NameSample> samples = > ObjectStreamUtils.createObjectStream(labelled)) > { > //HashMap<String, Object> map = new HashMap<>(); > > try (InputStream in = new FileInputStream(GENERATORS)) { > > // Resources > Map<String, Object> map = new HashMap<>(); > > // Pos Tagger > map.put("postagger.bin", Test.loadPosTagger(POSTAGGER)); > > > // Factory > factory = new TokenNameFinderFactory( > IOUtils.toByteArray(in), > map, > new BioCodec() > ); > > try { > > TrainingParameters mlParams = new TrainingParameters(); > mlParams.put(TrainingParameters.ALGORITHM_PARAM, > PerceptronTrainer.PERCEPTRON_VALUE); > mlParams.put(TrainingParameters.ITERATIONS_PARAM, > Integer.toString(300)); > mlParams.put(TrainingParameters.CUTOFF_PARAM, > Integer.toString(0)); > > TokenNameFinderModel model = NameFinderME.train("it", > "person", samples, mlParams, factory); > > try (BufferedOutputStream modelOut = new > BufferedOutputStream(new FileOutputStream(SERIALIZED))) { > model.serialize(modelOut); > } > > } catch (Exception ex) { > ex.printStackTrace(); > } > > } > } > } > > public static POSModel loadPosTagger (String modelName) { > > try (InputStream modelIn = new FileInputStream(modelName)) { > POSModel model = new POSModel(modelIn); > return model; > } > catch (Exception ex) { ex.printStackTrace(); } > > return null; > } > } > > *GENERATORS:* > > <?xml version="1.0" encoding="UTF-8"?> > <generators> > <cache> > <generators> > <window prevLength="4" nextLength="2"> > <tokenclass /> > </window> > <window prevLength="4" nextLength="2"> > <token /> > </window> > <!-- Pos Tagger --> > <window prevLength="4" nextLength="2"> > <tokenpos model="postagger.bin" /> > </window> > <definition /> > <prevmap /> > <bigram /> > <sentence begin="true" end="false" /> > </generators> > </cache> > </generators> > > > *OUTPUT (with error):* > > > *Indexing events using cutoff of 0 Computing event counts... done. 30 > events Indexing... done.Collecting events... Done indexing.Incorporating > indexed data for training... done. Number of Event Tokens: 30 Number of > Outcomes: 2 Number of Predicates: 144Computing model > parameters...Performing 300 iterations. 1: . (27/30) 0.9 2: . (30/30) > 1.0 3: . (30/30) 1.0 4: . (30/30) 1.0 5: . (30/30) 1.0Stopping: > change in training set accuracy less than 1.0E-5Stats: (30/30) > 1.0...done.Compressed 144 parameters to 621 outcome > patternsjava.lang.IllegalStateException: Missing serializer for > postagger.bin at > opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at > com.damiano.trainer.Test.<init>(Test.java:75) at > com.damiano.trainer.Test.main(Test.java:31)* > > 2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>: > >> Hmm let me try again, yes i copied it badly, i think the names are >> correct, i will give you a working example. >> >> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>: >> >>> Ok, but are you sure you used matching names? The exception states >>> it-pos-maxent.bin, >>> which object did you map to it? >>> >>> Jörn >>> >>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]> >>> wrote: >>> >>> > Hi Jorn! Yes >>> > >>> > <dependency> >>> > <groupId>org.apache.opennlp</groupId> >>> > <artifactId>opennlp-tools</artifactId> >>> > <version>1.8.0</version> >>> > </dependency> >>> > >>> > Do i need others dependencies too? >>> > >>> > >>> > >>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>: >>> > >>> > > This should be working. Did you test with 1.8.0? >>> > > >>> > > Jörn >>> > > >>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta < >>> [email protected]> >>> > > wrote: >>> > > >>> > > > Hello, >>> > > > i am using the POSTaggerFeatureGenerator via generators.xml >>> > > > >>> > > > <tokenpos model="postagger.bin" /> >>> > > > >>> > > > during the training i add this model in the resources doing: >>> > > > >>> > > > HashMap<String, Object> map = new HashMap<>(); >>> > > > map.put("postagger.bin", myPostaggerModel); >>> > > > >>> > > > >>> > > > factory = new TokenNameFinderFactory( >>> > > > IOUtils.toByteArray(in), >>> > > > map, >>> > > > new BioCodec() >>> > > > ); >>> > > > >>> > > > I get this error: >>> > > > >>> > > > java.lang.IllegalStateException: Missing serializer for >>> > > it-pos-maxent.bin >>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java: >>> 589) >>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187) >>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44) >>> > > > 2017-06-05 15:37:35 INFO Trainer:192 - >>> java.lang.IllegalStateExceptio >>> > n: >>> > > > Missing serializer for postagger.bin >>> > > > >>> > > > Do i have to change the extension of the file? >>> > > > >>> > > > Thanks >>> > > > >>> > > >>> > >>> >> >> >
