[ 
https://issues.apache.org/jira/browse/PDFBOX-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192122#comment-13192122
 ] 

Timo Boehme commented on PDFBOX-847:
------------------------------------

I second the previous comment about OutOfMemoryError. Since the error is 
re-thrown there is no added value in first catching it (hiding the error would 
be bad as well). Also from an API point of view catching the error is of no use 
since it is not declared as as possible 'exception' - and there is no need to 
since at every point an OutOfMemoryError may occur. If an application decides 
to catch OOM or the JVM is setup to print OOM stack trace the source of the OOM 
to be in FlateFilter can still be deduced (however only in single thread 
applications; in multi thread applications the OOM might be triggered by 
another thread - another point against catching it).
                
> FlateFilter.java swallows Exceptions (should rethrow)
> -----------------------------------------------------
>
>                 Key: PDFBOX-847
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-847
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.1
>            Reporter: Andreas Wollschlaeger
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>
> I just re-discovered an issue in FlateFilter.java, which i mentioned quite a 
> while ago on the mailinglist; and which was agreed to be an misfeature :-)
> In FlateFilter.java, at lines 115ff, we find this piece of code:
>                     try 
>                     {
>                         // decoding not needed
>                         while ((amountRead = decompressor.read(buffer, 0, 
> Math.min(mayRead,BUFFER_SIZE))) != -1)
>                         {
>                             result.write(buffer, 0, amountRead);
>                         }
>                     }
>                     catch (OutOfMemoryError exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may 
> occur
>                         log.error("Stop reading corrupt stream");
>                     }
>                     catch (ZipException exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may 
> occur
>                         log.error("Stop reading corrupt stream");
>                     }
>                     catch (EOFException exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may 
> occur
>                         log.error("Stop reading corrupt stream");
>                     }
> which means these Exceptions are discarded and not reported upstream to the 
> caller. This is very infortunate, as the caller has no means to discover that 
> text extraction is incomplete. I discovered this on troubleshooting Alfresco 
> DMS, which uses PDFBox for indexing PDF documents - except an innocent log 
> message, Alfresco does not know that conversion has failed.
> Proposed solution is to re-throw all 3 Exceptions and let the caller handle 
> the exceptions 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to