Tim Allison created TIKA-1994:
---------------------------------

             Summary: Integrate OCR with PDFParser
                 Key: TIKA-1994
                 URL: https://issues.apache.org/jira/browse/TIKA-1994
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


Users can now run OCR on individual images embedded inline with PDFs if they do 
the right configuration.  

It might be useful to run OCR against each rendered page (instead of the 
component images). 

Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912).  This will 
allow us to experiment with strategies until the cleaner integration is 
available with PDFBox 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to