[
https://issues.apache.org/jira/browse/OPENNLP-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618860#comment-17618860
]
Jeff Zemerick commented on OPENNLP-1388:
----------------------------------------
I don't think it's a good idea to change how a Span is created in the
NameFinder implementations because the codecs are all based on token positions.
Perhaps adding a new function to Span to get the covered text when the
positions are token-based is the best way to go.
> Inconsistency in span.getCoveredText()
> --------------------------------------
>
> Key: OPENNLP-1388
> URL: https://issues.apache.org/jira/browse/OPENNLP-1388
> Project: OpenNLP
> Issue Type: Task
> Affects Versions: 1.9.4, 2.0.0
> Reporter: Jeff Zemerick
> Assignee: Jeff Zemerick
> Priority: Major
>
> Span.getCoveredText() is getting the string based on the character start/end
> and not the token start/end.
> Example:
> string = "Neil Abercrombie Anibal Acevedo-Vila Gary Ackerman"
> span = [0..2) person
> span.getCoveredText(sentence)) returns "Ne" and not "Neil Abercrombie"
> What is the correct behavior?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)