Kabir Soneja created PDFBOX-6010:
------------------------------------

             Summary: PDF Image Extraction resulting in an infinite recursion
                 Key: PDFBOX-6010
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6010
             Project: PDFBox
          Issue Type: Bug
            Reporter: Kabir Soneja


Hi,

I am working on extracting images from a PDF using pdfbox version 2.0.34. While 
doing so we have our own recursive logic to recurse through all PDResources for 
each page and within each page we check for all the objects to filter out 
images. This recursive logic has a max depth of 25 to avoid infinite recursion.

When trying out the image extraction for the same PDF using the CLI, the image 
is extracted within a second indicating that the image extraction logic within 
the pdfbox source code is handling image extraction using an 
ImageGraphicsEngine defined within the source code.


 * To handle image extraction, are there are any API directly provided by 
PDFBox?
 * Is there any way to reuse the image extraction logic within the source code 
i.e is it exposed as a public API?
 * Any other suggestions to handle image extraction gracefully with/without 
recursion?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to