[ 
https://issues.apache.org/jira/browse/TIKA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208008#comment-14208008
 ] 

Tim Allison edited comment on TIKA-1471 at 11/12/14 1:00 PM:
-------------------------------------------------------------

>From the discussion on PDFBOX-2493, this looks to be solved by PDFBox 1.8.7, 
>which we're now using in trunk.

Thank you, [~alanbur], for reporting this issue on both Tika and PDFBox.  We 
need to fix these serious errors as they are discovered.  

At this point, code that uses Tika needs to be able to handle regular 
exceptions, OOM errors and permanent hangs...these catastrophic errors will 
happen...rarely...but they do happen.  

Use of the ForkParser and tika server can help avoid some of these issues, and 
on TIKA-1330, we're working to develop a robust wrapper around Tika that can 
handle these types of problems so that every integrator doesn't have to 
reinvent the wheel.




was (Author: [email protected]):
>From the discussion on PDFBOX-2493, this looks to be solved by PDFBox 1.8.8.  
>I'll leave this open until we upgrade.

Thank you, [~alanbur], for reporting this issue on both Tika and PDFBox.  We 
need to fix these serious errors as they are discovered.  

At this point, code that uses Tika needs to be able to handle regular 
exceptions, OOM errors and permanent hangs...these catastrophic errors will 
happen...rarely...but they do happen.  

Use of the ForkParser and tika server can help avoid some of these issues, and 
on TIKA-1330, we're working to develop a robust wrapper around Tika that can 
handle these types of problems so that every integrator doesn't have to 
reinvent the wheel.



> OOM with corrupt PDF file
> -------------------------
>
>                 Key: TIKA-1471
>                 URL: https://issues.apache.org/jira/browse/TIKA-1471
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.6
>         Environment: Linux, JVM 1.8.0_25-b17, 64-bit
>            Reporter: Alan Burlison
>            Priority: Blocker
>             Fix For: 1.7
>
>
> Use of PDFBox 1.8.6 by Tika 1.6 is causing OOM errors with corrupt PDF files, 
> due to a bug in PDFBox, see PDFBOX-2493. This makes Tika 1.6 unusable from 
> inside a long-running webapp and I've had to revert to Tika 1.5. Although 1.5 
> also throws errors with the corrupt file it does not cause OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to