[ https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson reassigned PDFBOX-2313: ----------------------------------- Assignee: John Hewson > ExtractImages finds never-rendered images > ----------------------------------------- > > Key: PDFBOX-2313 > URL: https://issues.apache.org/jira/browse/PDFBOX-2313 > Project: PDFBox > Issue Type: Bug > Components: Utilities > Affects Versions: 2.0.0 > Reporter: John Hewson > Assignee: John Hewson > > The file from PDFBOX-2101 is still causing unexpectedly high memory use with > ExtractImages when compared to PDFToImage. Given that PDFToImage uses the > same caching strategy, it's not really a caching issue, though we might still > want to think about that. > The PDF contains 55 images on the first page which are never rendered and > ExtractImages runs out of memory trying to extract them all. Given that PDFs > often contain junk like this, I suggest that ExtractImages only extract > images which are actually drawn to the page at some point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)