In order to get the TesseractOCRParserTest to run, having installed Tesseract
on OSX using “brew install tesseract”, I had to be explicit about the paths.
Any thoughts on how we could convey to a user that they might need to tweak the
path to run the unit tests? I was thinking about adding some sort of
messaging, but I don’t know if that is a pattern that we have in Tika with
these external dependencies?
Thoughts?
diff --git
a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
index 9ebcee068..32db2c442 100644
---
a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
+++
b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
@@ -51,6 +51,7 @@ public class TesseractOCRParserTest extends TikaTest {
public static boolean canRun() {
TesseractOCRConfig config = new TesseractOCRConfig();
+ config.setTesseractPath("/usr/local/bin");
TesseractOCRParserTest tesseractOCRTest = new TesseractOCRParserTest();
return tesseractOCRTest.canRun(config);
}
@@ -164,6 +165,8 @@ public class TesseractOCRParserTest extends TikaTest {
BasicContentHandlerFactory.HANDLER_TYPE handlerType,
TesseractOCRConfig.OUTPUT_TYPE outputType) throws
Exception {
TesseractOCRConfig config = new TesseractOCRConfig();
+ config.setTesseractPath("/usr/local/bin");
+
config.setTessdataPath("/usr/local/Cellar/tesseract/4.1.0/share/tessdata");
config.setOutputType(outputType);
Parser parser = new RecursiveParserWrapper(new AutoDetectParser(),
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> |
My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless of whether
attachments are marked as such.