Jorn,
the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries
(PR #220) but the problem with the postagger serialization still here. i
can confirm that the last snapshot cannot serialize the postagger using the
cmd tool,

*opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
-model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen
/home/damiano/test.xml -sequenceCodec BIO -resources
/home/damiano/lavoro/java/Parser/src/main/resources/*


*Writing name finder model ... Compressed 885605 parameters to 94030*
*3451 outcome patterns*
*Exception in thread "main" java.lang.IllegalStateException: Missing
serializer for it-pos-maxent.bin*
* at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)*
* at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)*
* at
opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)*
* at opennlp.tools.cmdline.CLI.main(CLI.java:244)*

I have used this generators.xml file:

*<?xml version="1.0" encoding="UTF-8"?>*
*<generators>*
*    <cache>*
*        <generators>*
*            <window prevLength="4" nextLength="2">*
*                <tokenclass />*
*            </window>*
*            <window prevLength="4" nextLength="2">*
*                <token />*
*            </window> *
*            <!-- Pos Tagger -->                *
*            <window prevLength="4" nextLength="2">*
*                <tokenpos model="it-pos-maxent.bin" />*
*            </window>       *
*            <definition />*
*            <prevmap />*
*            <bigram />*
*            <sentence begin="true" end="false" />          *
*        </generators>*
*    </cache>*
*</generators>*




2017-06-09 15:17 GMT+02:00 Damiano Porta <[email protected]>:

> Jorn,
> At the moment i am using the command tool to train my ner model, but i am
> getting this error:
>
> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
> -model /home/damiano/it-person-perceptron.bin -featuregen
> /home/damiano/test.xml -sequenceCodec BIO -resources
> /home/damiano/lavoro/java/Parser/src/main/resources/*
>
> *Exception in thread "main"
> opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError:
> opennlp.tools.util.InvalidFormatException: No dictionary resource for key:
> nations.dictionary*
> at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(
> TokenNameFinderFactory.java:209)
> at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(
> TokenNameFinderFactory.java:150)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241)
> at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(
> TokenNameFinderTrainerTool.java:169)
> at opennlp.tools.cmdline.CLI.main(CLI.java:244)
> Caused by: opennlp.tools.util.InvalidFormatException: No dictionary
> resource for key: nations.dict
> at opennlp.tools.util.featuregen.GeneratorFactory$
> DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251)
> at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> GeneratorFactory.java:732)
> at opennlp.tools.util.featuregen.GeneratorFactory$
> AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
> at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> GeneratorFactory.java:732)
> at opennlp.tools.util.featuregen.GeneratorFactory$
> CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172)
> at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> GeneratorFactory.java:732)
> at opennlp.tools.util.featuregen.GeneratorFactory$
> AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
> at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> GeneratorFactory.java:732)
> at opennlp.tools.util.featuregen.GeneratorFactory.create(
> GeneratorFactory.java:782)
> at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(
> TokenNameFinderFactory.java:189)
> ... 4 more
>
> As you can see the problem is "
> No dictionary resource for key: nations.dictionary" because i also need to
> add a dictionary inside my model.
>
> I did these test:
>
> *1. used the name nations.dictionary as resource name in my generators.xml
> and <dictionary dict="nations.dictionary" prefix="nation" />*
>
> *2.used the name nations.xml as resource name in my generators.xml and
> <dictionary dict="nations.xml" prefix="nation" />*
>
> *3.used the name nations.dict as resource name in my generators.xml and
> <dictionary dict="nations.dict" prefix="nation" />*
>
> for each test i also have renamed the dictionary fiile name inside my
> -resource directory.
>
> I had no luck.
>
> How should i call a dictionary resource?
>
> Thanks.
>
>
>
> 2017-06-07 16:20 GMT+02:00 Damiano Porta <[email protected]>:
>
>> Hello Jorn,
>> i confirm the error. Please take a look at the code below. It is a
>> working example, you only need to edit the constants GENERATORS, POSTAGGER
>> and SERIALIZED.
>>
>>
>> *TEST FILE:*
>>
>> package com.damiano.trainer;
>>
>> import java.io.BufferedOutputStream;
>> import java.io.FileInputStream;
>> import java.io.FileOutputStream;
>> import java.io.IOException;
>> import java.io.InputStream;
>> import java.util.ArrayList;
>> import java.util.HashMap;
>> import java.util.List;
>> import java.util.Map;
>> import opennlp.tools.ml.perceptron.PerceptronTrainer;
>> import opennlp.tools.namefind.BioCodec;
>> import opennlp.tools.namefind.NameFinderME;
>> import opennlp.tools.namefind.NameSample;
>> import opennlp.tools.namefind.TokenNameFinderFactory;
>> import opennlp.tools.namefind.TokenNameFinderModel;
>> import opennlp.tools.postag.POSModel;
>> import opennlp.tools.util.ObjectStream;
>> import opennlp.tools.util.ObjectStreamUtils;
>> import opennlp.tools.util.TrainingParameters;
>> import org.apache.commons.io.IOUtils;
>>
>> public class Test {
>>
>>     private final String GENERATORS = "/home/damiano/test.xml";
>>     private final String POSTAGGER = "/home/damiano/postagger.bin";
>>     private final String SERIALIZED = "/home/damiano/serialized.bin";
>>
>>     public static void main(String[] args) throws IOException {
>>         Test test = new Test();
>>     }
>>
>>     public Test() throws IOException {
>>
>>         List<NameSample> labelled = new ArrayList<>();
>>
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> JACOB <END>", false));
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> JACK <END>", false));
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> THOMAS <END>", false));
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> GEORGE <END>", false));
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> WILLIAM <END>", false));
>>         labelled.add(NameSample.parse("This is a sentence <START:person>
>> JAMES <END>", false));
>>
>>         TokenNameFinderFactory factory;
>>
>>         try (ObjectStream<NameSample> samples =
>> ObjectStreamUtils.createObjectStream(labelled)) {
>>             //HashMap<String, Object> map = new HashMap<>();
>>
>>             try (InputStream in = new FileInputStream(GENERATORS)) {
>>
>>                 // Resources
>>                 Map<String, Object> map = new HashMap<>();
>>
>>                 // Pos Tagger
>>                 map.put("postagger.bin", Test.loadPosTagger(POSTAGGER));
>>
>>
>>                 // Factory
>>                 factory = new TokenNameFinderFactory(
>>                     IOUtils.toByteArray(in),
>>                     map,
>>                     new BioCodec()
>>                 );
>>
>>                 try {
>>
>>                     TrainingParameters mlParams = new
>> TrainingParameters();
>>                     mlParams.put(TrainingParameters.ALGORITHM_PARAM,
>> PerceptronTrainer.PERCEPTRON_VALUE);
>>                     mlParams.put(TrainingParameters.ITERATIONS_PARAM,
>> Integer.toString(300));
>>                     mlParams.put(TrainingParameters.CUTOFF_PARAM,
>> Integer.toString(0));
>>
>>                     TokenNameFinderModel model = NameFinderME.train("it",
>> "person", samples, mlParams, factory);
>>
>>                     try (BufferedOutputStream modelOut = new
>> BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
>>                         model.serialize(modelOut);
>>                     }
>>
>>                 } catch (Exception ex) {
>>                     ex.printStackTrace();
>>                 }
>>
>>             }
>>         }
>>     }
>>
>>     public static POSModel loadPosTagger (String modelName) {
>>
>>         try (InputStream modelIn = new FileInputStream(modelName)) {
>>             POSModel model = new POSModel(modelIn);
>>             return model;
>>         }
>>         catch (Exception ex) { ex.printStackTrace();  }
>>
>>         return null;
>>     }
>> }
>>
>> *GENERATORS:*
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <generators>
>>     <cache>
>>         <generators>
>>             <window prevLength="4" nextLength="2">
>>                 <tokenclass />
>>             </window>
>>             <window prevLength="4" nextLength="2">
>>                 <token />
>>             </window>
>>             <!-- Pos Tagger -->
>>             <window prevLength="4" nextLength="2">
>>                 <tokenpos model="postagger.bin" />
>>             </window>
>>             <definition />
>>             <prevmap />
>>             <bigram />
>>             <sentence begin="true" end="false" />
>>         </generators>
>>     </cache>
>> </generators>
>>
>>
>> *OUTPUT (with error):*
>>
>>
>> *Indexing events using cutoff of 0 Computing event counts...  done. 30
>> events Indexing...  done.Collecting events... Done indexing.Incorporating
>> indexed data for training...  done. Number of Event Tokens: 30    Number of
>> Outcomes: 2  Number of Predicates: 144Computing model
>> parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  . (30/30)
>> 1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
>> change in training set accuracy less than 1.0E-5Stats: (30/30)
>> 1.0...done.Compressed 144 parameters to 621 outcome
>> patternsjava.lang.IllegalStateException: Missing serializer for
>> postagger.bin at
>> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
>> com.damiano.trainer.Test.<init>(Test.java:75) at
>> com.damiano.trainer.Test.main(Test.java:31)*
>>
>> 2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>:
>>
>>> Hmm let me try again, yes i copied it badly, i think the names are
>>> correct, i will give you a working example.
>>>
>>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>:
>>>
>>>> Ok, but are you sure you used matching names? The exception states
>>>> it-pos-maxent.bin,
>>>> which object did you map to it?
>>>>
>>>> Jörn
>>>>
>>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]>
>>>> wrote:
>>>>
>>>> > Hi Jorn! Yes
>>>> >
>>>> >         <dependency>
>>>> >             <groupId>org.apache.opennlp</groupId>
>>>> >             <artifactId>opennlp-tools</artifactId>
>>>> >             <version>1.8.0</version>
>>>> >         </dependency>
>>>> >
>>>> > Do i need others dependencies too?
>>>> >
>>>> >
>>>> >
>>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>:
>>>> >
>>>> > > This should be working. Did you test with 1.8.0?
>>>> > >
>>>> > > Jörn
>>>> > >
>>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <
>>>> [email protected]>
>>>> > > wrote:
>>>> > >
>>>> > > > Hello,
>>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
>>>> > > >
>>>> > > > <tokenpos model="postagger.bin" />
>>>> > > >
>>>> > > > during the training i add this model in the resources doing:
>>>> > > >
>>>> > > >         HashMap<String, Object> map = new HashMap<>();
>>>> > > >         map.put("postagger.bin", myPostaggerModel);
>>>> > > >
>>>> > > >
>>>> > > >          factory = new TokenNameFinderFactory(
>>>> > > >                IOUtils.toByteArray(in),
>>>> > > >                map,
>>>> > > >                new BioCodec()
>>>> > > >          );
>>>> > > >
>>>> > > > I get this error:
>>>> > > >
>>>> > > > java.lang.IllegalStateException: Missing serializer for
>>>> > > it-pos-maxent.bin
>>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:
>>>> 589)
>>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
>>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
>>>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
>>>> java.lang.IllegalStateExceptio
>>>> > n:
>>>> > > > Missing serializer for postagger.bin
>>>> > > >
>>>> > > > Do i have to change the extension of the file?
>>>> > > >
>>>> > > > Thanks
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to