[ 
https://issues.apache.org/jira/browse/PDFBOX-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993392#comment-15993392
 ] 

Tilman Hausherr commented on PDFBOX-3700:
-----------------------------------------

In 2.0.6 (probably in about a month) the memory footprint and speed will be 
better thanks to PDFBOX-3768, although the gc problem described won't go away. 
This improvement applies only to 1bit images like in your test case.

> OutOfMemoryException converting PDF to TIFF Images
> --------------------------------------------------
>
>                 Key: PDFBOX-3700
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3700
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.4
>            Reporter: Viraf Bankwalla
>             Fix For: 2.0.5, 3.0.0
>
>         Attachments: jira-pdfbox-3700.zip
>
>
> I am using PDFBox to convert PDF documents to a series of TIFF images (one 
> for each page).  The implementation uses PDFRenderer to render each page.  
> Things work fine when I am processing a single document in a single thread, 
> however when I try to process multiple documents (each in its own thread) I 
> get an OutOfMemoryException.
> In analyzing the heap dump, I see that this is caused by the images cached in 
> DefaultResourceCache.  Objects are added to the cache in PDResources, which 
> includes a method private boolean isAllowedCache(PDXObject xobject) that is 
> used to determine whether an PDXObject can be cached.  I have extended this 
> to filter out COSName.IMAGE, and am now able to process multiple documents in 
> parallel.
> A proposed fix would be to include Images in the set of objects not to add to 
> the cache.  For example, the following could be added to  
> PDResources.isAllowedCache
> {code:title=Bar.java|borderStyle=solid}
> COSBase image =  xobject.getCOSObject().getDictionaryObject(COSName.SUBTYPE);
> if (image instanceof COSName && ((COSName) image).equals(COSName.IMAGE))
> {
>              return false;            
> }
> {code}
> A possible patch is enclosed below.  I would like to get a fix in for the 
> next release.
> diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java 
> b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> index 6e1e464..aa94122 100644
> --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDResources.java
> @@ -31,15 +31,15 @@
>  import 
> org.apache.pdfbox.pdmodel.documentinterchange.markedcontent.PDPropertyList;
>  import org.apache.pdfbox.pdmodel.font.PDFont;
>  import org.apache.pdfbox.pdmodel.font.PDFontFactory;
> +import org.apache.pdfbox.pdmodel.graphics.PDXObject;
> +import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace;
>  import org.apache.pdfbox.pdmodel.graphics.color.PDPattern;
>  import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
> +import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
>  import 
> org.apache.pdfbox.pdmodel.graphics.optionalcontent.PDOptionalContentGroup;
> -import org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState;
> -import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace;
>  import org.apache.pdfbox.pdmodel.graphics.pattern.PDAbstractPattern;
>  import org.apache.pdfbox.pdmodel.graphics.shading.PDShading;
> -import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
> -import org.apache.pdfbox.pdmodel.graphics.PDXObject;
> +import org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState;
>  
>  /**
>   * A set of resources available at the page/pages/stream level.
> @@ -445,6 +445,12 @@
>                      return false;
>                  }
>              }
> +            
> +            COSBase image = 
> xobject.getCOSObject().getDictionaryObject(COSName.SUBTYPE);
> +            if (image instanceof COSName && ((COSName) 
> image).equals(COSName.IMAGE))
> +            {
> +             return false;
> +            }
>          }
>          return true;
>      }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to