Hello Jorn,
i confirm the error. Please take a look at the code below. It is a working
example, you only need to edit the constants GENERATORS, POSTAGGER and
SERIALIZED.
*TEST FILE:*
package com.damiano.trainer;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import opennlp.tools.ml.perceptron.PerceptronTrainer;
import opennlp.tools.namefind.BioCodec;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.ObjectStreamUtils;
import opennlp.tools.util.TrainingParameters;
import org.apache.commons.io.IOUtils;
public class Test {
private final String GENERATORS = "/home/damiano/test.xml";
private final String POSTAGGER = "/home/damiano/postagger.bin";
private final String SERIALIZED = "/home/damiano/serialized.bin";
public static void main(String[] args) throws IOException {
Test test = new Test();
}
public Test() throws IOException {
List<NameSample> labelled = new ArrayList<>();
labelled.add(NameSample.parse("This is a sentence <START:person>
JACOB <END>", false));
labelled.add(NameSample.parse("This is a sentence <START:person>
JACK <END>", false));
labelled.add(NameSample.parse("This is a sentence <START:person>
THOMAS <END>", false));
labelled.add(NameSample.parse("This is a sentence <START:person>
GEORGE <END>", false));
labelled.add(NameSample.parse("This is a sentence <START:person>
WILLIAM <END>", false));
labelled.add(NameSample.parse("This is a sentence <START:person>
JAMES <END>", false));
TokenNameFinderFactory factory;
try (ObjectStream<NameSample> samples =
ObjectStreamUtils.createObjectStream(labelled)) {
//HashMap<String, Object> map = new HashMap<>();
try (InputStream in = new FileInputStream(GENERATORS)) {
// Resources
Map<String, Object> map = new HashMap<>();
// Pos Tagger
map.put("postagger.bin", Test.loadPosTagger(POSTAGGER));
// Factory
factory = new TokenNameFinderFactory(
IOUtils.toByteArray(in),
map,
new BioCodec()
);
try {
TrainingParameters mlParams = new TrainingParameters();
mlParams.put(TrainingParameters.ALGORITHM_PARAM,
PerceptronTrainer.PERCEPTRON_VALUE);
mlParams.put(TrainingParameters.ITERATIONS_PARAM,
Integer.toString(300));
mlParams.put(TrainingParameters.CUTOFF_PARAM,
Integer.toString(0));
TokenNameFinderModel model = NameFinderME.train("it",
"person", samples, mlParams, factory);
try (BufferedOutputStream modelOut = new
BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
model.serialize(modelOut);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
}
public static POSModel loadPosTagger (String modelName) {
try (InputStream modelIn = new FileInputStream(modelName)) {
POSModel model = new POSModel(modelIn);
return model;
}
catch (Exception ex) { ex.printStackTrace(); }
return null;
}
}
*GENERATORS:*
<?xml version="1.0" encoding="UTF-8"?>
<generators>
<cache>
<generators>
<window prevLength="4" nextLength="2">
<tokenclass />
</window>
<window prevLength="4" nextLength="2">
<token />
</window>
<!-- Pos Tagger -->
<window prevLength="4" nextLength="2">
<tokenpos model="postagger.bin" />
</window>
<definition />
<prevmap />
<bigram />
<sentence begin="true" end="false" />
</generators>
</cache>
</generators>
*OUTPUT (with error):*
*Indexing events using cutoff of 0 Computing event counts... done. 30
events Indexing... done.Collecting events... Done indexing.Incorporating
indexed data for training... done. Number of Event Tokens: 30 Number of
Outcomes: 2 Number of Predicates: 144Computing model
parameters...Performing 300 iterations. 1: . (27/30) 0.9 2: . (30/30)
1.0 3: . (30/30) 1.0 4: . (30/30) 1.0 5: . (30/30) 1.0Stopping:
change in training set accuracy less than 1.0E-5Stats: (30/30)
1.0...done.Compressed 144 parameters to 621 outcome
patternsjava.lang.IllegalStateException: Missing serializer for
postagger.bin at
opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
com.damiano.trainer.Test.<init>(Test.java:75) at
com.damiano.trainer.Test.main(Test.java:31)*
2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>:
> Hmm let me try again, yes i copied it badly, i think the names are
> correct, i will give you a working example.
>
> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>:
>
>> Ok, but are you sure you used matching names? The exception states
>> it-pos-maxent.bin,
>> which object did you map to it?
>>
>> Jörn
>>
>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]>
>> wrote:
>>
>> > Hi Jorn! Yes
>> >
>> > <dependency>
>> > <groupId>org.apache.opennlp</groupId>
>> > <artifactId>opennlp-tools</artifactId>
>> > <version>1.8.0</version>
>> > </dependency>
>> >
>> > Do i need others dependencies too?
>> >
>> >
>> >
>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>:
>> >
>> > > This should be working. Did you test with 1.8.0?
>> > >
>> > > Jörn
>> > >
>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Hello,
>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
>> > > >
>> > > > <tokenpos model="postagger.bin" />
>> > > >
>> > > > during the training i add this model in the resources doing:
>> > > >
>> > > > HashMap<String, Object> map = new HashMap<>();
>> > > > map.put("postagger.bin", myPostaggerModel);
>> > > >
>> > > >
>> > > > factory = new TokenNameFinderFactory(
>> > > > IOUtils.toByteArray(in),
>> > > > map,
>> > > > new BioCodec()
>> > > > );
>> > > >
>> > > > I get this error:
>> > > >
>> > > > java.lang.IllegalStateException: Missing serializer for
>> > > it-pos-maxent.bin
>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
>> > > > 2017-06-05 15:37:35 INFO Trainer:192 -
>> java.lang.IllegalStateExceptio
>> > n:
>> > > > Missing serializer for postagger.bin
>> > > >
>> > > > Do i have to change the extension of the file?
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>
>