Jorn, the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries (PR #220) but the problem with the postagger serialization still here. i can confirm that the last snapshot cannot serialize the postagger using the cmd tool,
*opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen /home/damiano/test.xml -sequenceCodec BIO -resources /home/damiano/lavoro/java/Parser/src/main/resources/* *Writing name finder model ... Compressed 885605 parameters to 94030* *3451 outcome patterns* *Exception in thread "main" java.lang.IllegalStateException: Missing serializer for it-pos-maxent.bin* * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)* * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)* * at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)* * at opennlp.tools.cmdline.CLI.main(CLI.java:244)* I have used this generators.xml file: *<?xml version="1.0" encoding="UTF-8"?>* *<generators>* * <cache>* * <generators>* * <window prevLength="4" nextLength="2">* * <tokenclass />* * </window>* * <window prevLength="4" nextLength="2">* * <token />* * </window> * * <!-- Pos Tagger --> * * <window prevLength="4" nextLength="2">* * <tokenpos model="it-pos-maxent.bin" />* * </window> * * <definition />* * <prevmap />* * <bigram />* * <sentence begin="true" end="false" /> * * </generators>* * </cache>* *</generators>* 2017-06-09 15:17 GMT+02:00 Damiano Porta <[email protected]>: > Jorn, > At the moment i am using the command tool to train my ner model, but i am > getting this error: > > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it > -model /home/damiano/it-person-perceptron.bin -featuregen > /home/damiano/test.xml -sequenceCodec BIO -resources > /home/damiano/lavoro/java/Parser/src/main/resources/* > > *Exception in thread "main" > opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError: > opennlp.tools.util.InvalidFormatException: No dictionary resource for key: > nations.dictionary* > at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators( > TokenNameFinderFactory.java:209) > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator( > TokenNameFinderFactory.java:150) > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241) > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run( > TokenNameFinderTrainerTool.java:169) > at opennlp.tools.cmdline.CLI.main(CLI.java:244) > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary > resource for key: nations.dict > at opennlp.tools.util.featuregen.GeneratorFactory$ > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251) > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > GeneratorFactory.java:732) > at opennlp.tools.util.featuregen.GeneratorFactory$ > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > GeneratorFactory.java:732) > at opennlp.tools.util.featuregen.GeneratorFactory$ > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172) > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > GeneratorFactory.java:732) > at opennlp.tools.util.featuregen.GeneratorFactory$ > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > GeneratorFactory.java:732) > at opennlp.tools.util.featuregen.GeneratorFactory.create( > GeneratorFactory.java:782) > at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators( > TokenNameFinderFactory.java:189) > ... 4 more > > As you can see the problem is " > No dictionary resource for key: nations.dictionary" because i also need to > add a dictionary inside my model. > > I did these test: > > *1. used the name nations.dictionary as resource name in my generators.xml > and <dictionary dict="nations.dictionary" prefix="nation" />* > > *2.used the name nations.xml as resource name in my generators.xml and > <dictionary dict="nations.xml" prefix="nation" />* > > *3.used the name nations.dict as resource name in my generators.xml and > <dictionary dict="nations.dict" prefix="nation" />* > > for each test i also have renamed the dictionary fiile name inside my > -resource directory. > > I had no luck. > > How should i call a dictionary resource? > > Thanks. > > > > 2017-06-07 16:20 GMT+02:00 Damiano Porta <[email protected]>: > >> Hello Jorn, >> i confirm the error. Please take a look at the code below. It is a >> working example, you only need to edit the constants GENERATORS, POSTAGGER >> and SERIALIZED. >> >> >> *TEST FILE:* >> >> package com.damiano.trainer; >> >> import java.io.BufferedOutputStream; >> import java.io.FileInputStream; >> import java.io.FileOutputStream; >> import java.io.IOException; >> import java.io.InputStream; >> import java.util.ArrayList; >> import java.util.HashMap; >> import java.util.List; >> import java.util.Map; >> import opennlp.tools.ml.perceptron.PerceptronTrainer; >> import opennlp.tools.namefind.BioCodec; >> import opennlp.tools.namefind.NameFinderME; >> import opennlp.tools.namefind.NameSample; >> import opennlp.tools.namefind.TokenNameFinderFactory; >> import opennlp.tools.namefind.TokenNameFinderModel; >> import opennlp.tools.postag.POSModel; >> import opennlp.tools.util.ObjectStream; >> import opennlp.tools.util.ObjectStreamUtils; >> import opennlp.tools.util.TrainingParameters; >> import org.apache.commons.io.IOUtils; >> >> public class Test { >> >> private final String GENERATORS = "/home/damiano/test.xml"; >> private final String POSTAGGER = "/home/damiano/postagger.bin"; >> private final String SERIALIZED = "/home/damiano/serialized.bin"; >> >> public static void main(String[] args) throws IOException { >> Test test = new Test(); >> } >> >> public Test() throws IOException { >> >> List<NameSample> labelled = new ArrayList<>(); >> >> labelled.add(NameSample.parse("This is a sentence <START:person> >> JACOB <END>", false)); >> labelled.add(NameSample.parse("This is a sentence <START:person> >> JACK <END>", false)); >> labelled.add(NameSample.parse("This is a sentence <START:person> >> THOMAS <END>", false)); >> labelled.add(NameSample.parse("This is a sentence <START:person> >> GEORGE <END>", false)); >> labelled.add(NameSample.parse("This is a sentence <START:person> >> WILLIAM <END>", false)); >> labelled.add(NameSample.parse("This is a sentence <START:person> >> JAMES <END>", false)); >> >> TokenNameFinderFactory factory; >> >> try (ObjectStream<NameSample> samples = >> ObjectStreamUtils.createObjectStream(labelled)) { >> //HashMap<String, Object> map = new HashMap<>(); >> >> try (InputStream in = new FileInputStream(GENERATORS)) { >> >> // Resources >> Map<String, Object> map = new HashMap<>(); >> >> // Pos Tagger >> map.put("postagger.bin", Test.loadPosTagger(POSTAGGER)); >> >> >> // Factory >> factory = new TokenNameFinderFactory( >> IOUtils.toByteArray(in), >> map, >> new BioCodec() >> ); >> >> try { >> >> TrainingParameters mlParams = new >> TrainingParameters(); >> mlParams.put(TrainingParameters.ALGORITHM_PARAM, >> PerceptronTrainer.PERCEPTRON_VALUE); >> mlParams.put(TrainingParameters.ITERATIONS_PARAM, >> Integer.toString(300)); >> mlParams.put(TrainingParameters.CUTOFF_PARAM, >> Integer.toString(0)); >> >> TokenNameFinderModel model = NameFinderME.train("it", >> "person", samples, mlParams, factory); >> >> try (BufferedOutputStream modelOut = new >> BufferedOutputStream(new FileOutputStream(SERIALIZED))) { >> model.serialize(modelOut); >> } >> >> } catch (Exception ex) { >> ex.printStackTrace(); >> } >> >> } >> } >> } >> >> public static POSModel loadPosTagger (String modelName) { >> >> try (InputStream modelIn = new FileInputStream(modelName)) { >> POSModel model = new POSModel(modelIn); >> return model; >> } >> catch (Exception ex) { ex.printStackTrace(); } >> >> return null; >> } >> } >> >> *GENERATORS:* >> >> <?xml version="1.0" encoding="UTF-8"?> >> <generators> >> <cache> >> <generators> >> <window prevLength="4" nextLength="2"> >> <tokenclass /> >> </window> >> <window prevLength="4" nextLength="2"> >> <token /> >> </window> >> <!-- Pos Tagger --> >> <window prevLength="4" nextLength="2"> >> <tokenpos model="postagger.bin" /> >> </window> >> <definition /> >> <prevmap /> >> <bigram /> >> <sentence begin="true" end="false" /> >> </generators> >> </cache> >> </generators> >> >> >> *OUTPUT (with error):* >> >> >> *Indexing events using cutoff of 0 Computing event counts... done. 30 >> events Indexing... done.Collecting events... Done indexing.Incorporating >> indexed data for training... done. Number of Event Tokens: 30 Number of >> Outcomes: 2 Number of Predicates: 144Computing model >> parameters...Performing 300 iterations. 1: . (27/30) 0.9 2: . (30/30) >> 1.0 3: . (30/30) 1.0 4: . (30/30) 1.0 5: . (30/30) 1.0Stopping: >> change in training set accuracy less than 1.0E-5Stats: (30/30) >> 1.0...done.Compressed 144 parameters to 621 outcome >> patternsjava.lang.IllegalStateException: Missing serializer for >> postagger.bin at >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at >> com.damiano.trainer.Test.<init>(Test.java:75) at >> com.damiano.trainer.Test.main(Test.java:31)* >> >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>: >> >>> Hmm let me try again, yes i copied it badly, i think the names are >>> correct, i will give you a working example. >>> >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>: >>> >>>> Ok, but are you sure you used matching names? The exception states >>>> it-pos-maxent.bin, >>>> which object did you map to it? >>>> >>>> Jörn >>>> >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]> >>>> wrote: >>>> >>>> > Hi Jorn! Yes >>>> > >>>> > <dependency> >>>> > <groupId>org.apache.opennlp</groupId> >>>> > <artifactId>opennlp-tools</artifactId> >>>> > <version>1.8.0</version> >>>> > </dependency> >>>> > >>>> > Do i need others dependencies too? >>>> > >>>> > >>>> > >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>: >>>> > >>>> > > This should be working. Did you test with 1.8.0? >>>> > > >>>> > > Jörn >>>> > > >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta < >>>> [email protected]> >>>> > > wrote: >>>> > > >>>> > > > Hello, >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml >>>> > > > >>>> > > > <tokenpos model="postagger.bin" /> >>>> > > > >>>> > > > during the training i add this model in the resources doing: >>>> > > > >>>> > > > HashMap<String, Object> map = new HashMap<>(); >>>> > > > map.put("postagger.bin", myPostaggerModel); >>>> > > > >>>> > > > >>>> > > > factory = new TokenNameFinderFactory( >>>> > > > IOUtils.toByteArray(in), >>>> > > > map, >>>> > > > new BioCodec() >>>> > > > ); >>>> > > > >>>> > > > I get this error: >>>> > > > >>>> > > > java.lang.IllegalStateException: Missing serializer for >>>> > > it-pos-maxent.bin >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java: >>>> 589) >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187) >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44) >>>> > > > 2017-06-05 15:37:35 INFO Trainer:192 - >>>> java.lang.IllegalStateExceptio >>>> > n: >>>> > > > Missing serializer for postagger.bin >>>> > > > >>>> > > > Do i have to change the extension of the file? >>>> > > > >>>> > > > Thanks >>>> > > > >>>> > > >>>> > >>>> >>> >>> >> >
