[
https://issues.apache.org/jira/browse/OPENNLP-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250115#comment-14250115
]
William Colen commented on OPENNLP-629:
---------------------------------------
What happens with "The dog barks to the cat."? Maybe the training data doesn't
have enough examples of this structure.
I have never tried the English parser, but I had similar issues with
Portuguese, and the only way to fix was by adding a POS dictionary (don't know
if the parser allows) and extending the training dataset.
> Third person singular verbs are wrongly tagged as NNS instead of VBG
> --------------------------------------------------------------------
>
> Key: OPENNLP-629
> URL: https://issues.apache.org/jira/browse/OPENNLP-629
> Project: OpenNLP
> Issue Type: Bug
> Components: Parser
> Affects Versions: tools-1.5.3
> Environment: Windows, java 8
> Reporter: Ioan Barbulescu
> Priority: Minor
>
> Hi team
> In many cases, verbs (third person, singular) are wrongly tagged as "NNS"
> instead of being tagged as VBG.
> For example, for "the dog barks", we get the following parsing results:
> -0.4873670715270621 = DT/0.9543650543068872 NN/0.9934635416261295
> NNS/0.6478473815054814
> -2.3176263333647076 = DT/0.9543650543068872 NN/0.9934635416261295
> ./0.10389656993335769
> -2.5438814602384756 = DT/0.9543650543068872 NN/0.9934635416261295
> POS/0.08285903227408052
> -3.1472424852917578 = DT/0.9543650543068872 NN/0.9934635416261295
> VBG/0.045321418371414506
> -3.3093737662787484 = DT/0.9543650543068872 NN/0.9934635416261295
> RB/0.03853814197383135
> -3.785492750117388 = DT/0.9543650543068872 NN/0.9934635416261295
> IN/0.023939491699927738
> -4.419574088556415 = DT/0.9543650543068872 NN/0.9934635416261295
> NN/0.0126980460743554
> -4.641227787202645 = DT/0.9543650543068872 NN/0.9934635416261295
> WDT/0.010173582713485872
> -4.645517470925252 = DT/0.9543650543068872 NN/0.9934635416261295
> :/0.010130034731632277
> -5.319832699567059 = DT/0.9543650543068872 NN/0.9934635416261295
> ''/0.005161305328825757
> (TOP (NP (DT the) (NN dog) (NNS barks)))
> 2.6064504449697834
> (TOP (S (NP (DT the) (NN dog)) (NNS barks)))
> 1.9485980564427359
> The biggest probability for the third term is found for NNS - by far - 0.64.
> In comparison, VBG is found with a probability of only 0.04.
> This parsing error manifests itself consistently, for most occurrences of the
> third person / singular verbs, regardless the context.
> Am I missing something?
> Maybe there is some supplementary configuration that controls this?
> Can this be fixed only through code, or we need to patch our training data
> set?
> Thank you so much.
> BR,
> Ioan
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)