Tim Allison created TIKA-1994: --------------------------------- Summary: Integrate OCR with PDFParser Key: TIKA-1994 URL: https://issues.apache.org/jira/browse/TIKA-1994 Project: Tika Issue Type: Improvement Reporter: Tim Allison
Users can now run OCR on individual images embedded inline with PDFs if they do the right configuration. It might be useful to run OCR against each rendered page (instead of the component images). Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912). This will allow us to experiment with strategies until the cleaner integration is available with PDFBox 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)