[jira] [Commented] (OPENNLP-1388) Inconsistency in span.getCoveredText()

Lara Marinov (Jira) Sun, 26 Nov 2023 22:11:05 -0800


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789914#comment-17789914
 ]


Lara Marinov commented on OPENNLP-1388:
---------------------------------------

Is this issue still available for work? The link in the original post should be 
changed to 
[https://github.com/apache/opennlp/compare/main...jzonthemtn:opennlp:OPENNLP-1388].
 {{getCoveredText}} in {{Span}} does not know the token positions. How would 
one add a new function to {{Span}} to get the covered text when the positions 
are token-based? How could {{Span}} know the positions of the characters in 
terms of tokens? If you could give a hint for the API design, we could work on 
the implementation.

> Inconsistency in span.getCoveredText()
> --------------------------------------
>
>                 Key: OPENNLP-1388
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1388
>             Project: OpenNLP
>          Issue Type: Task
>    Affects Versions: 1.9.4, 2.0.0
>            Reporter: Jeff Zemerick
>            Assignee: Jeff Zemerick
>            Priority: Major
>
> Span.getCoveredText() is getting the string based on the character start/end 
> and not the token start/end.
> Example:
> string = "Neil Abercrombie Anibal Acevedo-Vila Gary Ackerman"
> span = [0..2) person
> span.getCoveredText(sentence)) returns "Ne" and not "Neil Abercrombie"
> What is the correct behavior?
> There is a branch that illustrates this at 
> https://github.com/apache/opennlp/compare/master...jzonthemtn:opennlp:OPENNLP-1388.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1388) Inconsistency in span.getCoveredText()

Reply via email to