Tim Allison created TIKA-1994:
---------------------------------
Summary: Integrate OCR with PDFParser
Key: TIKA-1994
URL: https://issues.apache.org/jira/browse/TIKA-1994
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
Users can now run OCR on individual images embedded inline with PDFs if they do
the right configuration.
It might be useful to run OCR against each rendered page (instead of the
component images).
Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912). This will
allow us to experiment with strategies until the cleaner integration is
available with PDFBox 2.1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)