[ 
https://issues.apache.org/jira/browse/OPENNLP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zemerick updated OPENNLP-1330:
-----------------------------------
    Fix Version/s:     (was: 1.9.3)
                   1.9.4

> Parser top k parses doesn't show "top" (highest probability) parses.
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1330
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1330
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.9.4
>            Reporter: Eric Ihli
>            Priority: Minor
>              Labels: Parser
>             Fix For: 1.9.4
>
>         Attachments: FixParserTopK.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  The default "top n parses" of 1 shows the most likely parse; one with a log 
> prob of -2.9.
> Passing in anything greater than 1 returns the least probable parses.
> {code:java}
> ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 1 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.515s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> Average: 41.7 sent/s
> Total: 1 sent
> Runtime: 0.024s
> Execution time: 2.585 seconds
> ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 2 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.578s)
> 0 -14.968552218782305 (TOP (S (NP (NNP Eric)) (SBAR (S (ADVP (VBZ is)) (VBG 
> testing.)))))
> 1 -13.957766393572408 (TOP (S (SBAR (S (NP (NNP Eric)) (ADVP (VBZ is)))) (VBG 
> testing.)))
> Average: 95.2 sent/s
> Total: 2 sent
> Runtime: 0.021s
> Execution time: 2.640 seconds
> {code}
>  The fix is clear and simple. We should be taking from the first of the 
> TreeSet rather than from the end.
> {code:java}
>  else if (numParses == 1) {
>  return new Parse[] {completeParses.first()};
>  }
>  else {
>  List<Parse> topParses = new ArrayList<>(numParses);
>  while (!completeParses.isEmpty() && topParses.size() < numParses) {
>  Parse tp = completeParses.last();      //// <--- Change to .first()
>  completeParses.remove(tp);
>  topParses.add(tp);
>  //parses.remove(tp);
>  }
>  return topParses.toArray(new Parse[topParses.size()]);
>  }{code}
> After patch, results are what I expect.
>  
> {code:java}
> ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 1 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.517s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> Average: 45.5 sent/s
> Total: 1 sent
> Runtime: 0.022s
> Execution time: 2.580 seconds
> ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 2 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.530s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> 1 -3.1132674983145825 (TOP (S (NP (NNP Eric)) (VP (VBZ is) (NP (NN 
> testing.)))))
> Average: 90.9 sent/s
> Total: 2 sent
> Runtime: 0.022s
> Execution time: 2.596 seconds
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to