[
https://issues.apache.org/jira/browse/PDFBOX-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493399#comment-17493399
]
Thomas Ledoux commented on PDFBOX-5375:
---------------------------------------
My bad. I definitely should have explain my use case better first.
What I want to do is retrieve the dimensions of all the images of a PDF as
efficiently as possible. In particular, I don't want to decode the image for
that (because this is a lot of processing).
My last blog better explains the case:
[https://openpreservation.org/blogs/scanned-vs-native-pdfs-how-to-differentiate-them/]
I manage to get around it but the code is quite weird. Try to port it to tika
would require a better handling of all cases (even when we don't have the image
decoder).
Hope this explains my needs better.
> Allow creating of PDFXObjectImage without accessing to the image stream
> -----------------------------------------------------------------------
>
> Key: PDFBOX-5375
> URL: https://issues.apache.org/jira/browse/PDFBOX-5375
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.25, 3.0.0 PDFBox
> Reporter: Thomas Ledoux
> Priority: Major
> Attachments: patch.txt, patch2.txt
>
>
> Currently, when a PDF embeds JPEG2000 images, the simple parsing of the file
> generates a warning
> when the code hits a call to getXObject(name) from a PDResources for a image
> without creating the
> underlining PDFXObjectImage object, related to the absence of the JAI
> third-party.
> However, when we just want to access the width or height propertis (which are
> defined outside the stream in the associated dictionnary).
> Looking at the constructor of PDFXObjectImage, it appears that the image is
> always read to retrieve the colorspace.
> The proposed patch is moved this initialization to the getColorSpace() method
> so that the object is created and the Exception will be raised only if the
> image needs to be really accessed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]