[jira] [Commented] (OPENNLP-1330) Parser top k parses doesn't show "top" (highest probability) parses.

ASF GitHub Bot (Jira) Fri, 18 Jun 2021 14:15:02 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365723#comment-17365723
 ]


ASF GitHub Bot commented on OPENNLP-1330:
-----------------------------------------

eihli commented on pull request #392:
URL: https://github.com/apache/opennlp/pull/392#issuecomment-864057912


   I think it's OK to merge this without adding a test. I couldn't find a 
location in the existing tests files that would be a good spot to add this. I'd 
love to dig in and figure out how to make a test case for this fit within an 
existing test file or create a new test. But that's a tall order (for me, given 
my newness to the codebase). 
   
   It would add significant delay for a small change that can be confidently 
reasoned about. The risk, in this case, is not that a code-related bug is 
introduced. The risk is that a reason/understanding-based bug is introduced. A 
test wouldn't fix faulty reasoning.
   
   As for the reason this is `first` rather than `last`, 
https://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html.
   
   ```
   A NavigableSet implementation based on a TreeMap. 
   The elements are ordered using their natural ordering, 
   or by a Comparator provided at set creation time, 
   depending on which constructor is used. 
   ```
   
   `completeParses` is defined as a TreeSet on line 187:
   
   ```
   completeParses = new TreeSet<>();
   ```
   
   And `Parse`s that are added to `completeParses` have a compareTo so they are 
ordered by their log probability.
   
   ```
     public int compareTo(Parse p) {
       return Double.compare(p.getProb(), this.getProb());
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Parser top k parses doesn't show "top" (highest probability) parses.
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1330
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1330
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.9.4
>            Reporter: Eric Ihli
>            Priority: Minor
>              Labels: Parser
>         Attachments: FixParserTopK.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
>  The default "top n parses" of 1 shows the most likely parse; one with a log 
> prob of -2.9.
> Passing in anything greater than 1 returns the least probable parses.
> {code:java}
> ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 1 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.515s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> Average: 41.7 sent/s
> Total: 1 sent
> Runtime: 0.024s
> Execution time: 2.585 seconds
> ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 2 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.578s)
> 0 -14.968552218782305 (TOP (S (NP (NNP Eric)) (SBAR (S (ADVP (VBZ is)) (VBG 
> testing.)))))
> 1 -13.957766393572408 (TOP (S (SBAR (S (NP (NNP Eric)) (ADVP (VBZ is)))) (VBG 
> testing.)))
> Average: 95.2 sent/s
> Total: 2 sent
> Runtime: 0.021s
> Execution time: 2.640 seconds
> {code}
>  The fix is clear and simple. We should be taking from the first of the 
> TreeSet rather than from the end.
> {code:java}
>  else if (numParses == 1) {
>  return new Parse[] {completeParses.first()};
>  }
>  else {
>  List<Parse> topParses = new ArrayList<>(numParses);
>  while (!completeParses.isEmpty() && topParses.size() < numParses) {
>  Parse tp = completeParses.last();      //// <--- Change to .first()
>  completeParses.remove(tp);
>  topParses.add(tp);
>  //parses.remove(tp);
>  }
>  return topParses.toArray(new Parse[topParses.size()]);
>  }{code}
> After patch, results are what I expect.
>  
> {code:java}
> ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 1 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.517s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> Average: 45.5 sent/s
> Total: 1 sent
> Runtime: 0.022s
> Execution time: 2.580 seconds
> ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 2 
> ~/src/prhyme/resources/models/en-parser-chunking.bin
> Loading Parser model ... done (2.530s)
> 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
> 1 -3.1132674983145825 (TOP (S (NP (NNP Eric)) (VP (VBZ is) (NP (NN 
> testing.)))))
> Average: 90.9 sent/s
> Total: 2 sent
> Runtime: 0.022s
> Execution time: 2.596 seconds
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OPENNLP-1330) Parser top k parses doesn't show "top" (highest probability) parses.

Reply via email to