[ 
https://issues.apache.org/jira/browse/PDFBOX-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493399#comment-17493399
 ] 

Thomas Ledoux commented on PDFBOX-5375:
---------------------------------------

My bad. I definitely should have explain my use case better first.

What I want to do is retrieve the dimensions of all the images of a PDF as 
efficiently as possible. In particular, I don't want to decode the image for 
that (because this is a lot of processing).

My last blog better explains the case: 
[https://openpreservation.org/blogs/scanned-vs-native-pdfs-how-to-differentiate-them/]

I manage to get around it but the code is quite weird. Try to port it to tika 
would require a better handling of all cases (even when we don't have the image 
decoder).

Hope this explains my needs better. 

> Allow creating of PDFXObjectImage without accessing to the image stream
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-5375
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5375
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.25, 3.0.0 PDFBox
>            Reporter: Thomas Ledoux
>            Priority: Major
>         Attachments: patch.txt, patch2.txt
>
>
> Currently, when a PDF embeds JPEG2000 images, the simple parsing of the file 
> generates a warning
> when the code hits a call to getXObject(name) from a PDResources for a image 
> without creating the
> underlining PDFXObjectImage object, related to the absence of the JAI 
> third-party.
> However, when we just want to access the width or height propertis (which are 
> defined outside the stream in the associated dictionnary).
> Looking at the constructor of PDFXObjectImage, it appears that the image is 
> always read to retrieve the colorspace.
> The proposed patch is moved this initialization to the getColorSpace() method 
> so that the object is created and the Exception will be raised only if the 
> image needs to be really accessed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to