Doccat training tool throws NullPointer error
---------------------------------------------
Key: OPENNLP-488
URL: https://issues.apache.org/jira/browse/OPENNLP-488
Project: OpenNLP
Issue Type: Bug
Components: Doccat
Environment: Using cygwin on Windows
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode)
apache-opennlp-1.5.2
Reporter: Erik Andersson
When following the example in the OpenNLP 1.5.2 documentation I get a
NullPointerException.
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.doccat.training.tool
$ bin/opennlp DoccatTrainer -encoding UTF-8 -lang en -data en-doccat.train
-model en-doccat.bin
Indexing events using cutoff of 5
Computing event counts... done. 2 events
Indexing... Dropped event GMDecrease:[bow=Major, bow=acquisitions,
bow=that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the,
bow=existing, bow=network, bow=also, bow=had, bow=a, bow=negative, bow=impact,
bow=on, bow=the, bow=overall, bow=gross, bow=margin,, bow=but, bow=it,
bow=should, bow=improve, bow=following, bow=the, bow=implementation, bow=of,
bow=its, bow=integration, bow=strategies, bow=.]
Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of, bow=gross,
bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to,
bow=adjustments, bow=to, bow=obligations, bow=towards, bow=dealers, bow=.]
done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:182)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:192)
at
opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:91)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)
The file "en-doccat.train" is UTF-8 encoded in UNIX format and looks like this:
GMDecrease Major acquisitions that have a lower gross margin than the existing
network also had a negative impact on the overall gross margin, but it should
improve following the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts pursuant
to adjustments to obligations towards dealers .
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira