[
https://issues.apache.org/jira/browse/TIKA-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613814#comment-16613814
]
Dmitry Goldenberg commented on TIKA-2627:
-----------------------------------------
I agree, there is something wrong here for sure. The whole point is to just
drop any excess text.
{code:java}
// In org/apache/tika/sax/WriteOutContentHandler
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (writeLimit == -1 || writeCount + length <= writeLimit) {
super.characters(ch, start, length);
writeCount += length;
} else {
super.characters(ch, start, writeLimit - writeCount);
writeCount = writeLimit;
throw new WriteLimitReachedException(
"Your document contained more than " + writeLimit
+ " characters, and so your requested limit has been"
+ " reached. To receive the full text of the document,"
+ " increase your limit. (Text up to the limit is"
+ " however available).", tag);
}
}
{code}
This should not throw; at the maximum, this should just log a warning and keep
going.
> Exception thrown when max string length is reached
> --------------------------------------------------
>
> Key: TIKA-2627
> URL: https://issues.apache.org/jira/browse/TIKA-2627
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.17
> Environment: Windows 2012 R2
> Java 1.8.0_151
> Reporter: Caleb Ott
> Priority: Major
> Attachments: ExceptionStacktrace.txt
>
>
> I have set the max string length and expected tika to parse up to that limit
> then return me the text. However, for certain files it appears that once that
> limit is reached, instead of returning the text parsed so far, it is throwing
> an exception.
> It looks like the WriteLimitReachedException is being wrapped in another
> exception which is why it is not being caught.
> Attached is the stack trace I am getting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)