[ 
https://issues.apache.org/jira/browse/TIKA-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613814#comment-16613814
 ] 

Dmitry Goldenberg commented on TIKA-2627:
-----------------------------------------

I agree, there is something wrong here for sure. The whole point is to just 
drop any excess text.

 
{code:java}
// In org/apache/tika/sax/WriteOutContentHandler 
  @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        if (writeLimit == -1 || writeCount + length <= writeLimit) {
            super.characters(ch, start, length);
            writeCount += length;
        } else {
            super.characters(ch, start, writeLimit - writeCount);
            writeCount = writeLimit;
            throw new WriteLimitReachedException(
                    "Your document contained more than " + writeLimit
                    + " characters, and so your requested limit has been"
                    + " reached. To receive the full text of the document,"
                    + " increase your limit. (Text up to the limit is"
                    + " however available).", tag);
        }
    }
{code}
This should not throw; at the maximum, this should just log a warning and keep 
going.

> Exception thrown when max string length is reached
> --------------------------------------------------
>
>                 Key: TIKA-2627
>                 URL: https://issues.apache.org/jira/browse/TIKA-2627
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>         Environment: Windows 2012 R2
> Java 1.8.0_151
>            Reporter: Caleb Ott
>            Priority: Major
>         Attachments: ExceptionStacktrace.txt
>
>
> I have set the max string length and expected tika to parse up to that limit 
> then return me the text. However, for certain files it appears that once that 
> limit is reached, instead of returning the text parsed so far, it is throwing 
> an exception.
> It looks like the WriteLimitReachedException is being wrapped in another 
> exception which is why it is not being caught.
> Attached is the stack trace I am getting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to