Tim Allison created TIKA-2317:
---------------------------------

             Summary: Add alert that string was truncated before counting tokens
                 Key: TIKA-2317
                 URL: https://issues.apache.org/jira/browse/TIKA-2317
             Project: Tika
          Issue Type: Improvement
          Components: tika-eval
            Reporter: Tim Allison
            Priority: Trivial


As a memory safety feature, there's a hard limit in the length of the string 
that is processed by the token counter.  We should alert the user to when the 
string is truncated because comparisons can be misleading in the case that 
extractA packs more words into the first 1000000 characters than does extractB 
even though there are actually more tokens in extractB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to