[
https://issues.apache.org/jira/browse/TIKA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2317.
-------------------------------
Resolution: Fixed
> Add alert that string was truncated before counting tokens
> ----------------------------------------------------------
>
> Key: TIKA-2317
> URL: https://issues.apache.org/jira/browse/TIKA-2317
> Project: Tika
> Issue Type: Improvement
> Components: tika-eval
> Reporter: Tim Allison
> Priority: Trivial
>
> As a memory safety feature, there's a hard limit in the length of the string
> that is processed by the token counter. We should alert the user to when the
> string is truncated because comparisons can be misleading in the case that
> extractA packs more words into the first 1000000 characters than does
> extractB even though there are actually more tokens in extractB.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)