Eric Pugh created TIKA-2093:
-------------------------------
Summary: Add hOCR output type to the TesseractOCRParser
Key: TIKA-2093
URL: https://issues.apache.org/jira/browse/TIKA-2093
Project: Tika
Issue Type: Improvement
Components: ocr
Affects Versions: 1.13
Reporter: Eric Pugh
Fix For: 1.14
I've tweaked the TesseractOCRParser and TesseractOCRConfig to add the "txt" or
"hocr" parameters that allows you to get specific outputs. There are also
"pdf" and in the next version of Tesseract a "tsv" outputs, but didn't add
support for those.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)