Tim Allison created TIKA-2317:
---------------------------------
Summary: Add alert that string was truncated before counting tokens
Key: TIKA-2317
URL: https://issues.apache.org/jira/browse/TIKA-2317
Project: Tika
Issue Type: Improvement
Components: tika-eval
Reporter: Tim Allison
Priority: Trivial
As a memory safety feature, there's a hard limit in the length of the string
that is processed by the token counter. We should alert the user to when the
string is truncated because comparisons can be misleading in the case that
extractA packs more words into the first 1000000 characters than does extractB
even though there are actually more tokens in extractB.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)