Re: Missing serializer for postagger.bin

Damiano Porta Wed, 07 Jun 2017 07:21:30 -0700

Hello Jorn,
i confirm the error. Please take a look at the code below. It is a working
example, you only need to edit the constants GENERATORS, POSTAGGER and
SERIALIZED.



*TEST FILE:*

package com.damiano.trainer;

import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import opennlp.tools.ml.perceptron.PerceptronTrainer;
import opennlp.tools.namefind.BioCodec;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.ObjectStreamUtils;
import opennlp.tools.util.TrainingParameters;
import org.apache.commons.io.IOUtils;

public class Test {

    private final String GENERATORS = "/home/damiano/test.xml";
    private final String POSTAGGER = "/home/damiano/postagger.bin";
    private final String SERIALIZED = "/home/damiano/serialized.bin";

    public static void main(String[] args) throws IOException {
        Test test = new Test();
    }

    public Test() throws IOException {

        List<NameSample> labelled = new ArrayList<>();

        labelled.add(NameSample.parse("This is a sentence <START:person>
JACOB <END>", false));
        labelled.add(NameSample.parse("This is a sentence <START:person>
JACK <END>", false));
        labelled.add(NameSample.parse("This is a sentence <START:person>
THOMAS <END>", false));
        labelled.add(NameSample.parse("This is a sentence <START:person>
GEORGE <END>", false));
        labelled.add(NameSample.parse("This is a sentence <START:person>
WILLIAM <END>", false));
        labelled.add(NameSample.parse("This is a sentence <START:person>
JAMES <END>", false));

        TokenNameFinderFactory factory;

        try (ObjectStream<NameSample> samples =
ObjectStreamUtils.createObjectStream(labelled)) {
            //HashMap<String, Object> map = new HashMap<>();

            try (InputStream in = new FileInputStream(GENERATORS)) {

                // Resources
                Map<String, Object> map = new HashMap<>();

                // Pos Tagger
                map.put("postagger.bin", Test.loadPosTagger(POSTAGGER));


                // Factory
                factory = new TokenNameFinderFactory(
                    IOUtils.toByteArray(in),
                    map,
                    new BioCodec()
                );

                try {

                    TrainingParameters mlParams = new TrainingParameters();
                    mlParams.put(TrainingParameters.ALGORITHM_PARAM,
PerceptronTrainer.PERCEPTRON_VALUE);
                    mlParams.put(TrainingParameters.ITERATIONS_PARAM,
Integer.toString(300));
                    mlParams.put(TrainingParameters.CUTOFF_PARAM,
Integer.toString(0));

                    TokenNameFinderModel model = NameFinderME.train("it",
"person", samples, mlParams, factory);

                    try (BufferedOutputStream modelOut = new
BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
                        model.serialize(modelOut);
                    }

                } catch (Exception ex) {
                    ex.printStackTrace();
                }

            }
        }
    }

    public static POSModel loadPosTagger (String modelName) {

        try (InputStream modelIn = new FileInputStream(modelName)) {
            POSModel model = new POSModel(modelIn);
            return model;
        }
        catch (Exception ex) { ex.printStackTrace();  }

        return null;
    }
}

*GENERATORS:*

<?xml version="1.0" encoding="UTF-8"?>
<generators>
    <cache>
        <generators>
            <window prevLength="4" nextLength="2">
                <tokenclass />
            </window>
            <window prevLength="4" nextLength="2">
                <token />
            </window>
            <!-- Pos Tagger -->
            <window prevLength="4" nextLength="2">
                <tokenpos model="postagger.bin" />
            </window>
            <definition />
            <prevmap />
            <bigram />
            <sentence begin="true" end="false" />
        </generators>
    </cache>
</generators>


*OUTPUT (with error):*


*Indexing events using cutoff of 0 Computing event counts...  done. 30
events Indexing...  done.Collecting events... Done indexing.Incorporating
indexed data for training...  done. Number of Event Tokens: 30    Number of
Outcomes: 2  Number of Predicates: 144Computing model
parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  . (30/30)
1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
change in training set accuracy less than 1.0E-5Stats: (30/30)
1.0...done.Compressed 144 parameters to 621 outcome
patternsjava.lang.IllegalStateException: Missing serializer for
postagger.bin at
opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
com.damiano.trainer.Test.<init>(Test.java:75) at
com.damiano.trainer.Test.main(Test.java:31)*

2017-06-07 15:48 GMT+02:00 Damiano Porta <[email protected]>:

> Hmm let me try again, yes i copied it badly, i think the names are
> correct, i will give you a working example.
>
> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <[email protected]>:
>
>> Ok, but are you sure you used matching names? The exception states
>> it-pos-maxent.bin,
>> which object did you map to it?
>>
>> Jörn
>>
>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <[email protected]>
>> wrote:
>>
>> > Hi Jorn! Yes
>> >
>> >         <dependency>
>> >             <groupId>org.apache.opennlp</groupId>
>> >             <artifactId>opennlp-tools</artifactId>
>> >             <version>1.8.0</version>
>> >         </dependency>
>> >
>> > Do i need others dependencies too?
>> >
>> >
>> >
>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <[email protected]>:
>> >
>> > > This should be working. Did you test with 1.8.0?
>> > >
>> > > Jörn
>> > >
>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Hello,
>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
>> > > >
>> > > > <tokenpos model="postagger.bin" />
>> > > >
>> > > > during the training i add this model in the resources doing:
>> > > >
>> > > >         HashMap<String, Object> map = new HashMap<>();
>> > > >         map.put("postagger.bin", myPostaggerModel);
>> > > >
>> > > >
>> > > >          factory = new TokenNameFinderFactory(
>> > > >                IOUtils.toByteArray(in),
>> > > >                map,
>> > > >                new BioCodec()
>> > > >          );
>> > > >
>> > > > I get this error:
>> > > >
>> > > > java.lang.IllegalStateException: Missing serializer for
>> > > it-pos-maxent.bin
>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589)
>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
>> java.lang.IllegalStateExceptio
>> > n:
>> > > > Missing serializer for postagger.bin
>> > > >
>> > > > Do i have to change the extension of the file?
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>
>

Re: Missing serializer for postagger.bin

Reply via email to