[
https://issues.apache.org/jira/browse/PDFBOX-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076003#comment-16076003
]
Tilman Hausherr edited comment on PDFBOX-3856 at 7/6/17 10:36 AM:
------------------------------------------------------------------
Tika is at 1.15. PDFBox is at 2.0.6. How about testing with the current
version? There was a memory problem related to -TextPosition leaks- high memory
usage in text extraction a few months ago (PDFBOX-3442).
If it still doesn't work, please attach a PDF. Also mention what -Xmx setting
you are using.
was (Author: tilman):
Tika is at 1.15. PDFBox is at 2.0.6. How about testing with the current
version? There was a memory problem related to TextPosition leaks a few months
ago (I can't find the issue right now)
If it still doesn't work, please attach a PDF. Also mention what -Xmx setting
you are using.
> Non-large PDF's can cause Out of Memory Exceptions
> --------------------------------------------------
>
> Key: PDFBOX-3856
> URL: https://issues.apache.org/jira/browse/PDFBOX-3856
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.1
> Reporter: Nicholas DiPiazza
> Priority: Blocker
> Attachments: Pasted image at 2017_07_05 02_26 PM.png
>
>
> Tika version: 1.13
> PDFBox Version: 2.0.1
> We are using an application that attempts to make PDFs searchable using
> Apache Tika which in downstream uses PDF Box to parse PDFs to extract the
> body of a PDF in text to make it searchable.
> We allow basically any PDF from anywhere to come in as long as it isn't too
> large (9 MB).
> However, we are noticing some PDFs, even though they are not that large in
> file size, can cause zip bombs to eat up all the heap space and crash the JVM.
> There is some sort of Object[] array that has millions of
> {code}org.apache.pdfbox.text.TextPosition{code}
> Here is a snapshot of the heapdump:
> https://issues.apache.org/jira/secure/attachment/12875808/Pasted%20image%20at%202017_07_05%2002_26%20PM.png
> Is there a setting to limit the size of this particular array so that it
> doesn't cause a memory bomb?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]