[
https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson resolved PDFBOX-2313.
---------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
> ExtractImages finds never-rendered images
> -----------------------------------------
>
> Key: PDFBOX-2313
> URL: https://issues.apache.org/jira/browse/PDFBOX-2313
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Fix For: 2.0.0
>
>
> The file from PDFBOX-2101 is still causing unexpectedly high memory use with
> ExtractImages when compared to PDFToImage. Given that PDFToImage uses the
> same caching strategy, it's not really a caching issue, though we might still
> want to think about that.
> The PDF contains 55 images on the first page which are never rendered and
> ExtractImages runs out of memory trying to extract them all. Given that PDFs
> often contain junk like this, I suggest that ExtractImages only extract
> images which are actually drawn to the page at some point.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)