[ 
https://issues.apache.org/jira/browse/TIKA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2317.
-------------------------------
    Resolution: Fixed

> Add alert that string was truncated before counting tokens
> ----------------------------------------------------------
>
>                 Key: TIKA-2317
>                 URL: https://issues.apache.org/jira/browse/TIKA-2317
>             Project: Tika
>          Issue Type: Improvement
>          Components: tika-eval
>            Reporter: Tim Allison
>            Priority: Trivial
>
> As a memory safety feature, there's a hard limit in the length of the string 
> that is processed by the token counter.  We should alert the user to when the 
> string is truncated because comparisons can be misleading in the case that 
> extractA packs more words into the first 1000000 characters than does 
> extractB even though there are actually more tokens in extractB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to