[
https://issues.apache.org/jira/browse/TIKA-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377450#comment-17377450
]
Tim Allison commented on TIKA-3470:
-----------------------------------
The more I look at this, the more I think we should punt on the warning and
rely on PDFBox's log.error that there's no jpx reader available. We could add
code that looks for the JPXDecode, but it would be errorprone. I confirmed
that PDFBox logs an error if the jpeg2000 library is not available.
Unless there are objections, I'll add this to the "breaking changes" section of
the changes file and remove the warning from our PDFParser.
> Push jpeg2000 warning to trigger only when necessary
> ----------------------------------------------------
>
> Key: TIKA-3470
> URL: https://issues.apache.org/jira/browse/TIKA-3470
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
> We currently test for the jpeg2000 library on initialization of the PDFParser
> and log a warning if the non ASF 2.0-friendly library is not available. It
> would be better to trigger that warning only if users are processing files in
> a way that would require the library. That is, at parse time, and only if
> the user requests OCR or image extraction on a PDF that contains a jpeg2000.
> There's an example of a jpeg2000 inside a PDF here:
> https://github.com/mozilla/pdf.js/issues/11004
--
This message was sent by Atlassian Jira
(v8.3.4#803005)