[
https://issues.apache.org/jira/browse/OPENNLP-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771646#comment-13771646
]
Ioan Barbulescu commented on OPENNLP-597:
-----------------------------------------
I see your point.
Please find below a few examples of the input data that should be able to
regenerate the original exceptions:
Missing tag (line 1):
ROOT ( (NP (NP (PRP$ Your) (NN contribution))
(PP (TO to)
(NP (NNP Goodwill))))
(VP (MD will)
(VP (VB mean)
(NP (NP (JJR more))
(SBAR (IN than)
(S (NP (PRP you))
(VP (MD may)
(VP (VB know))))))))
(. .)))
Garbage after tag, line 1:
ROOT (S (NP (NP-2 (PRP$ Your) (NN contribution))
(PP (TO to)
(NP (NNP Goodwill))))
(VP (MD will)
(VP (VB mean)
(NP (NP (JJR more))
(SBAR (IN than)
(S (NP (PRP you))
(VP (MD may)
(VP (VB know))))))))
(. .)))
Missing RRB:
(ROOT (S (NP (DT The) (NN magazine))
(VP (VBD stated)
(SBAR (IN that)
(S (NP (NP (NNP Goodwill))
(-LRB- -LRB-)
(CONJP (RB as) (RB well) (IN as))
(NP (DT the) (JJ other) (NNS standouts))
( -RRB-))
(VP (VBZ is)
('' ")
(ADJP (RB uniquely) (JJ effective) (, ,) (JJ innovative)
(CC or) (JJ valuable))))))
(. .)
('' ")))
Thank you.
> Code in tools/parser throws some NullPointerExceptions when dealing with poor
> training data
> -------------------------------------------------------------------------------------------
>
> Key: OPENNLP-597
> URL: https://issues.apache.org/jira/browse/OPENNLP-597
> Project: OpenNLP
> Issue Type: Bug
> Components: Parser
> Affects Versions: tools-1.5.3
> Environment: Windows 7 + java 1.7.0_21
> Reporter: Ioan Barbulescu
> Priority: Minor
> Fix For: 1.6.0
>
> Attachments: tools.patch
>
>
> I was trying to train the Treebank Parser with some new data.
> Truth to be told, the data was in poor format. Specifically, instead of
> "(-RRB- -RRB-)", it contained "( -RRB-)".
> The same for -LRB- constructions.
> Due to this input data, the parsing code was throwing some
> NullPointerException errors.
> The fixes consist in some supplementary "if()"s, to safeguard against null
> pointers.
> Fixes are in 3 files, attached as diff. The diff was created by svn, run in
> the opennlp-tool/.../parser directory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira