[
https://issues.apache.org/jira/browse/PDFBOX-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181129#comment-13181129
]
Steve Lancashire commented on PDFBOX-847:
-----------------------------------------
What makes this particularly nasty is one of the clients of this method,
PDFTextStripper.processPages calls it once per page, so out of memory errors
can be continually rethrown and supressed for a single large PDF document,
completely crippling a JVM.
> FlateFilter.java swallows Exceptions (should rethrow)
> -----------------------------------------------------
>
> Key: PDFBOX-847
> URL: https://issues.apache.org/jira/browse/PDFBOX-847
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.2.1
> Reporter: Andreas Wollschlaeger
>
> I just re-discovered an issue in FlateFilter.java, which i mentioned quite a
> while ago on the mailinglist; and which was agreed to be an misfeature :-)
> In FlateFilter.java, at lines 115ff, we find this piece of code:
> try
> {
> // decoding not needed
> while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
> {
> result.write(buffer, 0, amountRead);
> }
> }
> catch (OutOfMemoryError exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (ZipException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (EOFException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> which means these Exceptions are discarded and not reported upstream to the
> caller. This is very infortunate, as the caller has no means to discover that
> text extraction is incomplete. I discovered this on troubleshooting Alfresco
> DMS, which uses PDFBox for indexing PDF documents - except an innocent log
> message, Alfresco does not know that conversion has failed.
> Proposed solution is to re-throw all 3 Exceptions and let the caller handle
> the exceptions
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira