Nicholas DiPiazza created PDFBOX-3856:
-----------------------------------------
Summary: Non-large PDF's can cause Out of Memory Exceptions
Key: PDFBOX-3856
URL: https://issues.apache.org/jira/browse/PDFBOX-3856
Project: PDFBox
Issue Type: Bug
Reporter: Nicholas DiPiazza
Priority: Blocker
Attachments: Pasted image at 2017_07_05 02_26 PM.png
We are using an application that attempts to make PDFs searchable using Apache
Tika which in downstream uses PDF Box to parse PDFs to extract the body of a
PDF in text to make it searchable.
We allow basically any PDF from anywhere to come in as long as it isn't too
large (9 MB).
However, we are noticing some PDFs, even though they are not that large in file
size, can cause zip bombs to eat up all the heap space and crash the JVM.
There is some sort of Object[] array that has millions of
{{{org.apache.pdfbox.text.TextPosition}}}.
Is there a setting to limit the size of this particular array so that it
doesn't cause a memory bomb?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]