[ https://issues.apache.org/jira/browse/PDFBOX-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952029#comment-17952029 ]
Tilman Hausherr commented on PDFBOX-6010: ----------------------------------------- how-to questions should be asked on the users mailing list or on stackoverflow. But I'll still answer that one; the best is to look at the source code of ExtractImages and adjust it for your needs. This one passes through the page content stream (and other streams) like a renderer would do. You extend the PDFGraphicsStreamEngine class and implement drawImage(). The downside is that you may get some images many times, and you will miss orphan images. > PDF Image Extraction resulting in an infinite recursion > ------------------------------------------------------- > > Key: PDFBOX-6010 > URL: https://issues.apache.org/jira/browse/PDFBOX-6010 > Project: PDFBox > Issue Type: Bug > Reporter: Kabir Soneja > Priority: Major > Labels: how-to > > Hi, > I am working on extracting images from a PDF using pdfbox version 2.0.34. > While doing so we have our own recursive logic to recurse through all > PDResources for each page and within each page we check for all the objects > to filter out images. This recursive logic has a max depth of 25 to avoid > infinite recursion. > When trying out the image extraction for the same PDF using the CLI, the > image is extracted within a second indicating that the image extraction logic > within the pdfbox source code is handling image extraction using an > ImageGraphicsEngine defined within the source code. > Can you help me understand: > * To handle image extraction, are there are any API directly provided by > PDFBox? > * Is there any way to reuse the image extraction logic within the source > code i.e is it exposed as a public API? > * Any other suggestions to handle image extraction gracefully with/without > recursion? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org