Doccat training tool throws NullPointer error
---------------------------------------------

                 Key: OPENNLP-488
                 URL: https://issues.apache.org/jira/browse/OPENNLP-488
             Project: OpenNLP
          Issue Type: Bug
          Components: Doccat
         Environment: Using cygwin on Windows
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode)
apache-opennlp-1.5.2
            Reporter: Erik Andersson


When following the example in the OpenNLP 1.5.2 documentation I get a 
NullPointerException.

http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.doccat.training.tool

$ bin/opennlp DoccatTrainer -encoding UTF-8 -lang en -data en-doccat.train 
-model en-doccat.bin
Indexing events using cutoff of 5

        Computing event counts...  done. 2 events
        Indexing...  Dropped event GMDecrease:[bow=Major, bow=acquisitions, 
bow=that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the, 
bow=existing, bow=network, bow=also, bow=had, bow=a, bow=negative, bow=impact, 
bow=on, bow=the, bow=overall, bow=gross, bow=margin,, bow=but, bow=it, 
bow=should, bow=improve, bow=following, bow=the, bow=implementation, bow=of, 
bow=its, bow=integration, bow=strategies, bow=.]
Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of, bow=gross, 
bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to, 
bow=adjustments, bow=to, bow=obligations, bow=towards, bow=dealers, bow=.]
done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:192)
        at 
opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:91)
        at opennlp.tools.cmdline.CLI.main(CLI.java:191)


The file "en-doccat.train" is UTF-8 encoded in UNIX format and looks like this:

GMDecrease  Major acquisitions that have a lower gross margin than the existing 
network also had a negative impact on the overall gross margin, but it should 
improve following the implementation of its integration strategies .
GMIncrease  The upward movement of gross margin resulted from amounts pursuant 
to adjustments to obligations towards dealers .


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to