Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaOCR" page has been changed by DaveMeikle:
https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=4&rev2=5

Comment:
Added information for overriding default configuration

  
  `curl -T /path/to/tiff/image.tiff http://localhost:9998/tika --header 
"Content-type: image/tiff"`
  
+ = Overriding Default Configuration =
+ 
+ When using the OCR Parser Tika will use the following default settings:
+  * Tesseract installation path = ""
+  * Language dictionary = "eng"
+  * Page Segmentation Mode = "1"
+  * Minmum file size = 0
+  * Maximum file size = 2147483647
+  * Timeout = 120
+ 
+ To changes these settings you can either modify the existing 
TesseractOCRConfig.properties file in 
tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by 
creating your own and placing it in the package org/apache/tika/parser/ocr on 
your classpath.
+ 
+ It is worth noting that doing this when using one of the executable JARs, 
either the tika-app or tika-server JARs, will require you to execute them 
without using the ''-jar'' command. For example, something like the following 
for the tika-app or tika-server, respectively:
+ 
+ `java -cp /path/to/your/classpath:/path/to/tika-app-X.X.jar 
org.apache.tika.cli.TikaCLI`
+ 
+ `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar 
org.apache.tika.server.TikaServerCli`
+ 

Reply via email to