-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------
(Updated Sept. 18, 2014, 10:07 p.m.)
Review request for tika and Chris Mattmann.
Changes
-------
Updated the patch to use JUnit Assume to ignore the tests if Tesseract is not
installed and cleaned up some of the Exception throwing.
Bugs: TIKA-93
https://issues.apache.org/jira/browse/TIKA-93
Repository: tika
Description
-------
Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
Diffs (updated)
-----
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java
PRE-CREATION
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java
PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
1624766
Diff: https://reviews.apache.org/r/22402/diff/
Testing
-------
Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
Thanks,
Tyler Palsulich