Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaOCR" page has been changed by DaveMeikle: https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=4&rev2=5 Comment: Added information for overriding default configuration `curl -T /path/to/tiff/image.tiff http://localhost:9998/tika --header "Content-type: image/tiff"` + = Overriding Default Configuration = + + When using the OCR Parser Tika will use the following default settings: + * Tesseract installation path = "" + * Language dictionary = "eng" + * Page Segmentation Mode = "1" + * Minmum file size = 0 + * Maximum file size = 2147483647 + * Timeout = 120 + + To changes these settings you can either modify the existing TesseractOCRConfig.properties file in tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by creating your own and placing it in the package org/apache/tika/parser/ocr on your classpath. + + It is worth noting that doing this when using one of the executable JARs, either the tika-app or tika-server JARs, will require you to execute them without using the ''-jar'' command. For example, something like the following for the tika-app or tika-server, respectively: + + `java -cp /path/to/your/classpath:/path/to/tika-app-X.X.jar org.apache.tika.cli.TikaCLI` + + `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar org.apache.tika.server.TikaServerCli` +
