[
https://issues.apache.org/jira/browse/PDFBOX-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192178#comment-13192178
]
Adam Nichols commented on PDFBOX-847:
-------------------------------------
Should we be catching ZipException and EOFException here without letting the
caller know that it was unable to decompress the stream?
As for the proposed solution of catching them and then re-throwing them, why
not just not catch them in the first place? The caller will be able to log
them if they see fit, and they'll have the stacktrace to get to the exact line
which caused the issue (e.g. was it a problem reading or a problem writing?).
If the exception is re-thrown, the stacktrace will point to the catch block,
which is less helpful.
> FlateFilter.java swallows Exceptions (should rethrow)
> -----------------------------------------------------
>
> Key: PDFBOX-847
> URL: https://issues.apache.org/jira/browse/PDFBOX-847
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.2.1
> Reporter: Andreas Wollschlaeger
> Assignee: Andreas Lehmkühler
> Fix For: 1.7.0
>
>
> I just re-discovered an issue in FlateFilter.java, which i mentioned quite a
> while ago on the mailinglist; and which was agreed to be an misfeature :-)
> In FlateFilter.java, at lines 115ff, we find this piece of code:
> try
> {
> // decoding not needed
> while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
> {
> result.write(buffer, 0, amountRead);
> }
> }
> catch (OutOfMemoryError exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (ZipException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> catch (EOFException exception)
> {
> // if the stream is corrupt an OutOfMemoryError may
> occur
> log.error("Stop reading corrupt stream");
> }
> which means these Exceptions are discarded and not reported upstream to the
> caller. This is very infortunate, as the caller has no means to discover that
> text extraction is incomplete. I discovered this on troubleshooting Alfresco
> DMS, which uses PDFBox for indexing PDF documents - except an innocent log
> message, Alfresco does not know that conversion has failed.
> Proposed solution is to re-throw all 3 Exceptions and let the caller handle
> the exceptions
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira