[
https://issues.apache.org/jira/browse/PDFBOX-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884681#comment-15884681
]
Tilman Hausherr commented on PDFBOX-3700:
-----------------------------------------
I created a dump and had a look at it.
No PDXObjectImage objects found in class list.
By looking at SoftReference objects:
7 instances PDXObjectImage but not referenced from elsewhere, 3 BufferedImage
instances
biggest in dump is int[], 6 instances, it's from BufferedImage
- 15 instances of BufferedImage
- 14 instances of BugImgSurfaceData
- 13 instances of BufImgSurfaceManager
Sadly all this fails to find a cause in PDFBox itself, i.e. that we hold a
PDXObjectImage or a BufferedImage for too long thus preventing the
SoftReference'd objects to be recovered by gc.
Thus, Plan B.
[~msahyoun] please add this to the FAQ in the rendering segment:
Q: I'm getting an OutOfMemoryError. What can I do?
A: The memory footprint depends on the PDF itself and on the resolution you use
for rendering. Some possible options:
- increase the {{-Xmx}} value when starting java
- be careful not to hold your images after rendering them, e.g. avoid putting
all images of a PDF into a {{List}}
- don't forgot to close your {{PDDocument}} objects
- decrease the scale when calling {{PDFRenderer.renderImage()}}, or the dpi
value when calling PDFRenderer.renderImageWithDPI()}}
- disable the cache for {{PDImageXObject}} objects by calling
{{PDDocument.setResourceCache()}} with a cache object that is derived from
{{DefaultResourceCache}} and whose call {{public void put(COSObject indirect,
PDXObject xobject)}} does nothing. Be aware that this will slow down rendering
for PDF files that have an identical image in several pages (e.g. a company
logo or a background). More about this can be read in PDFBOX-3700
https://issues.apache.org/jira/browse/PDFBOX-3700 .
> OutOfMemoryException converting PDF to TIFF Images
> --------------------------------------------------
>
> Key: PDFBOX-3700
> URL: https://issues.apache.org/jira/browse/PDFBOX-3700
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.4
> Reporter: Viraf Bankwalla
> Attachments: jira-pdfbox-3700.zip
>
>
> I am using PDFBox to convert PDF documents to a series of TIFF images (one
> for each page). The implementation uses PDFRenderer to render each page.
> Things work fine when I am processing a single document in a single thread,
> however when I try to process multiple documents (each in its own thread) I
> get an OutOfMemoryException.
> In analyzing the heap dump, I see that this is caused by the images cached in
> DefaultResourceCache. Objects are added to the cache in PDResources, which
> includes a method private boolean isAllowedCache(PDXObject xobject) that is
> used to determine whether an PDXObject can be cached. I have extended this
> to filter out COSName.IMAGE, and am now able to process multiple documents in
> parallel.
> A proposed fix would be to include Images in the set of objects not to add to
> the cache. For example, the following could be added to
> PDResources.isAllowedCache
> {code:title=Bar.java|borderStyle=solid}
> COSBase image = xobject.getCOSObject().getDictionaryObject(COSName.SUBTYPE);
> if (image instanceof COSName && ((COSName) image).equals(COSName.IMAGE))
> {
> return false;
> }
> {code}
> A possible patch is enclosed below. I would like to get a fix in for the
> next release.
> diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> index 6e1e464..aa94122 100644
> --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> @@ -31,15 +31,15 @@
> import
> org.apache.pdfbox.pdmodel.documentinterchange.markedcontent.PDPropertyList;
> import org.apache.pdfbox.pdmodel.font.PDFont;
> import org.apache.pdfbox.pdmodel.font.PDFontFactory;
> +import org.apache.pdfbox.pdmodel.graphics.PDXObject;
> +import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace;
> import org.apache.pdfbox.pdmodel.graphics.color.PDPattern;
> import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
> +import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
> import
> org.apache.pdfbox.pdmodel.graphics.optionalcontent.PDOptionalContentGroup;
> -import org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState;
> -import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace;
> import org.apache.pdfbox.pdmodel.graphics.pattern.PDAbstractPattern;
> import org.apache.pdfbox.pdmodel.graphics.shading.PDShading;
> -import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
> -import org.apache.pdfbox.pdmodel.graphics.PDXObject;
> +import org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState;
>
> /**
> * A set of resources available at the page/pages/stream level.
> @@ -445,6 +445,12 @@
> return false;
> }
> }
> +
> + COSBase image =
> xobject.getCOSObject().getDictionaryObject(COSName.SUBTYPE);
> + if (image instanceof COSName && ((COSName)
> image).equals(COSName.IMAGE))
> + {
> + return false;
> + }
> }
> return true;
> }
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]