[
https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned TIKA-1994:
---------------------------------
Assignee: Tim Allison
> Integrate OCR with PDFParser
> ----------------------------
>
> Key: TIKA-1994
> URL: https://issues.apache.org/jira/browse/TIKA-1994
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Assignee: Tim Allison
>
> Users can now run OCR on individual images embedded inline with PDFs if they
> do the right configuration.
> It might be useful to run OCR against each rendered page (instead of the
> component images).
> Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912). This will
> allow us to experiment with strategies until the cleaner integration is
> available with PDFBox 2.1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)