[
https://issues.apache.org/jira/browse/OPENNLP-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583088#comment-13583088
]
Joern Kottmann commented on OPENNLP-562:
----------------------------------------
No, we can close the ticket as well, but we usually follow the protocol that
the one who opened the ticket is the one who closes it to confirm that the fix
really solved his problem.
> invoking .find() on a RegexNameFinder instance brings back Spans with
> identical start/end indices
> -------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-562
> URL: https://issues.apache.org/jira/browse/OPENNLP-562
> Project: OpenNLP
> Issue Type: Bug
> Components: Name Finder
> Affects Versions: tools-1.5.2-incubating
> Environment: Ubuntu 12.10 64-bit Java 7 u11
> Reporter: Jim Piliouras
> Assignee: James Kosin
> Labels: bug, regex, span
> Fix For: tools-1.5.3
>
> Attachments: OPENNLP-562.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The RegexNameFinder class has a serious bug...Whenever it finds something it
> produces a Span with the same start/end index. This happens because
> 'sentencePosTokenMap' stores the same position for the start and end of the
> token.Conceptually this fine, after all it is the same token, however later
> on matcher.start()/end() is invoked to determine what to ask from the
> map.Well, if we've stored the same position we will get the same number and
> the Span will be ruined, right? The trick here is to store i+1 for the
> endIndex for that token in the map. That is essentially the position of next
> token, but since we're expecting tokenized text anyway everything is
> fine...Untokenized text breaks the system anyway so in my opinion it is safe
> to apply the forthcoming patch. A dirty approach would be to leave the map as
> is and simply replace 'matcher.end()' with 'matcher.end()+1' when we're doing
> the lookup.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira