[
https://issues.apache.org/jira/browse/OPENNLP-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781817#comment-13781817
]
Joern Kottmann commented on OPENNLP-597:
----------------------------------------
Sorry for the delay. Case two looks good to me, the parsed label will simply be
NP-2, case three and case one should throw some kind of parse exception in my
opinion, the missing label/token will make code which depends on it crash.
What do you think?
Another option would be to set either the token or the label to null, as far as
I see there is no way to tell which one is missing. Not sure if that is a good
idea, since it will be tricky to write code which uses these incomplete parse
trees. Therefore I believe that the data needs to be cleaned at some point
anyway, and why not do this before its passed to the parser?
> Code in tools/parser throws some NullPointerExceptions when dealing with poor
> training data
> -------------------------------------------------------------------------------------------
>
> Key: OPENNLP-597
> URL: https://issues.apache.org/jira/browse/OPENNLP-597
> Project: OpenNLP
> Issue Type: Bug
> Components: Parser
> Affects Versions: tools-1.5.3
> Environment: Windows 7 + java 1.7.0_21
> Reporter: Ioan Barbulescu
> Priority: Minor
> Fix For: 1.6.0
>
> Attachments: tools.patch
>
>
> I was trying to train the Treebank Parser with some new data.
> Truth to be told, the data was in poor format. Specifically, instead of
> "(-RRB- -RRB-)", it contained "( -RRB-)".
> The same for -LRB- constructions.
> Due to this input data, the parsing code was throwing some
> NullPointerException errors.
> The fixes consist in some supplementary "if()"s, to safeguard against null
> pointers.
> Fixes are in 3 files, attached as diff. The diff was created by svn, run in
> the opennlp-tool/.../parser directory.
--
This message was sent by Atlassian JIRA
(v6.1#6144)