[ 
https://issues.apache.org/jira/browse/OPENNLP-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781817#comment-13781817
 ] 

Joern Kottmann commented on OPENNLP-597:
----------------------------------------

Sorry for the delay. Case two looks good to me, the parsed label will simply be 
NP-2, case three and case one should throw some kind of parse exception in my 
opinion, the missing label/token will make code which depends on it crash.

What do you think?

Another option would be to set either the token or the label to null, as far as 
I see there is no way to tell which one is missing. Not sure if that is a good 
idea, since it will be tricky to write code which uses these incomplete parse 
trees. Therefore I believe that the data needs to be cleaned at some point 
anyway, and why not do this before its passed to the parser?

> Code in tools/parser throws some NullPointerExceptions when dealing with poor 
> training data
> -------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-597
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-597
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.3
>         Environment: Windows 7 + java 1.7.0_21 
>            Reporter: Ioan Barbulescu
>            Priority: Minor
>             Fix For: 1.6.0
>
>         Attachments: tools.patch
>
>
> I was trying to train the Treebank Parser with some new data.
> Truth to be told, the data was in poor format. Specifically, instead of 
> "(-RRB- -RRB-)", it contained "( -RRB-)".
> The same for -LRB- constructions.
> Due to this input data, the parsing code was throwing some 
> NullPointerException errors.
> The fixes consist in some supplementary "if()"s, to safeguard against null 
> pointers.
> Fixes are in 3 files, attached as diff. The diff was created by svn, run in 
> the opennlp-tool/.../parser directory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to