John Hewson created PDFBOX-2313:
-----------------------------------
Summary: ExtractImages finds never-rendered images
Key: PDFBOX-2313
URL: https://issues.apache.org/jira/browse/PDFBOX-2313
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 2.0.0
Reporter: John Hewson
The file from PDFBOX-2101 is still causing unexpectedly high memory use with
ExtractImages when compared to PDFToImage. Given that PDFToImage uses the same
caching strategy, it's not really a caching issue, though we might still want
to think about that.
The PDF contains 55 images on the first page which are never rendered and
ExtractImages runs out of memory trying to extract them all. Given that PDFs
often contain junk like this, I suggest that ExtractImages only extract images
which are actually drawn to the page at some point.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)