Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaOCR" page has been changed by SergeyTsalkov:
https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=6&rev2=7

  
  `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar 
org.apache.tika.server.TikaServerCli`
  
+ = Disable Tika OCR =
+ Tika's OCR will trigger on images embedded within, say, office documents in 
addition to images you upload directly. Because OCR slows down Tika, you might 
want to disable it if you don't need the results. You can disable OCR by simply 
uninstalling tesseract, but if that's not an option, here is a tika.xml config 
file that disables OCR:
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?>
+ <properties>
+   <parsers>
+     <parser class="org.apache.tika.parser.DefaultParser">
+       <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
+     </parser>
+   </parsers>
+ </properties>
+ }}}
+ 

Reply via email to