[
https://issues.apache.org/jira/browse/OPENNLP-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tommaso Teofili resolved OPENNLP-488.
-------------------------------------
Resolution: Fixed
I've created OPENNLP-837 as a follow up issue regarding notifying users about
not enough training data, marking this as resolved then.
> Doccat training tool throws NullPointer error
> ---------------------------------------------
>
> Key: OPENNLP-488
> URL: https://issues.apache.org/jira/browse/OPENNLP-488
> Project: OpenNLP
> Issue Type: Bug
> Components: Doccat
> Environment: Using cygwin on Windows
> java version "1.6.0_27"
> Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
> Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode)
> apache-opennlp-1.5.2
> Reporter: Erik Andersson
> Assignee: Tommaso Teofili
> Attachments: OPENNLP-488.patch, en-doccat.train
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> When following the example in the OpenNLP 1.5.2 documentation I get a
> NullPointerException.
> http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.doccat.training.tool
> $ bin/opennlp DoccatTrainer -encoding UTF-8 -lang en -data en-doccat.train
> -model en-doccat.bin
> Indexing events using cutoff of 5
> Computing event counts... done. 2 events
> Indexing... Dropped event GMDecrease:[bow=Major, bow=acquisitions,
> bow=that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than,
> bow=the, bow=existing, bow=network, bow=also, bow=had, bow=a, bow=negative,
> bow=impact, bow=on, bow=the, bow=overall, bow=gross, bow=margin,, bow=but,
> bow=it, bow=should, bow=improve, bow=following, bow=the, bow=implementation,
> bow=of, bow=its, bow=integration, bow=strategies, bow=.]
> Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of,
> bow=gross, bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant,
> bow=to, bow=adjustments, bow=to, bow=obligations, bow=towards, bow=dealers,
> bow=.]
> done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> at opennlp.maxent.GIS.trainModel(GIS.java:256)
> at opennlp.model.TrainUtil.train(TrainUtil.java:182)
> at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
> at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
> at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:192)
> at
> opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:91)
> at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> The file "en-doccat.train" is UTF-8 encoded in UNIX format and looks like
> this:
> GMDecrease Major acquisitions that have a lower gross margin than the
> existing network also had a negative impact on the overall gross margin, but
> it should improve following the implementation of its integration strategies .
> GMIncrease The upward movement of gross margin resulted from amounts
> pursuant to adjustments to obligations towards dealers .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)