[ 
https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-2313:
-----------------------------------

    Assignee: John Hewson

> ExtractImages finds never-rendered images
> -----------------------------------------
>
>                 Key: PDFBOX-2313
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2313
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>
> The file from PDFBOX-2101 is still causing unexpectedly high memory use with 
> ExtractImages when compared to PDFToImage. Given that PDFToImage uses the 
> same caching strategy, it's not really a caching issue, though we might still 
> want to think about that.
> The PDF contains 55 images on the first page which are never rendered and 
> ExtractImages runs out of memory trying to extract them all. Given that PDFs 
> often contain junk like this, I suggest that ExtractImages only extract 
> images which are actually drawn to the page at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to