[ 
https://issues.apache.org/jira/browse/STANBOL-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-818.
-----------------------------------------

    Resolution: Fixed

fixed with http://svn.apache.org/viewvc?rev=1414147&view=rev

This was caused by a Bug in the ProcessingState class: This class iterates over 
Sections (typically Sentences) in the parsed content and collects the Tokens 
within those sections. If a Section does not contain any Linkable Token, than 
it continues with the next Section. However in those cases the tokens of the 
last sections where not correctly reset. 

Because of that the tokens list contained Tokens of the previous section in 
cases where the previous sentence had not a single linkable Token. In such 
situations offset calculations where flawed resulting in negative indexes for 
calls to substring().
                
> EntitylinkingEngine encounters StringIndexOutOfBounds exceptions
> ----------------------------------------------------------------
>
>                 Key: STANBOL-818
>                 URL: https://issues.apache.org/jira/browse/STANBOL-818
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>
> For some texts the EntityLinkingEngine encounters negative String indexes
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -6
>       at java.lang.String.substring(String.java:1931)
>       at 
> org.apache.stanbol.enhancer.engines.entitylinking.impl.ProcessingState.getTokenText(ProcessingState.java:324)
> A text that triggers this is "It comprises 114 counties and one independent 
> city. Missouri's capital is Jefferson City. "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to