[
https://issues.apache.org/jira/browse/PDFBOX-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192102#comment-13192102
]
Lau Brino commented on PDFBOX-847:
----------------------------------
Proper fix should be not to catch the OutOfMemoryError at all. If you encounter
this state, you cannot be even able to write something to a log. In a
multithreaded application you cannot guarantee anything.
So please do not catch OutOfMemoryError at all.
> FlateFilter.java swallows Exceptions (should rethrow)
> -----------------------------------------------------
>
> Key: PDFBOX-847
> URL: https://issues.apache.org/jira/browse/PDFBOX-847
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.2.1
> Reporter: Andreas Wollschlaeger
> Assignee: Andreas Lehmkühler
> Fix For: 1.7.0
>
>
> I just re-discovered an issue in FlateFilter.java, which i mentioned quite a
> while ago on the mailinglist; and which was agreed to be an misfeature :-)
> In FlateFilter.java, at lines 115ff, we find this piece of code:
> try
> {
> // decoding not needed
> while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
> {
> result.write(buffer, 0, amountRead);
> }
> }
> catch (OutOfMemoryError exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (ZipException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (EOFException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> which means these Exceptions are discarded and not reported upstream to the
> caller. This is very infortunate, as the caller has no means to discover that
> text extraction is incomplete. I discovered this on troubleshooting Alfresco
> DMS, which uses PDFBox for indexing PDF documents - except an innocent log
> message, Alfresco does not know that conversion has failed.
> Proposed solution is to re-throw all 3 Exceptions and let the caller handle
> the exceptions
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira