Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaOCR" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=11&rev2=12 With [[https://issues.apache.org/jira/browse/TIKA-93|TIKA-93]] you can now use the awesome Tesseract OCR parser within Tika! - First some instructions on getting it installed. + First some instructions on getting it installed. See Tesseract's [[https://github.com/tesseract-ocr/tesseract/wiki|readme]]. = Mac Installation Instructions = @@ -27, +27 @@ 2. uninstall leptonica `brew uninstall leptonica` 3. install leptonica with tiff support `brew install leptonica --with-libtiff` 4. install tesseract `brew install tesseract --all-languages --with-serial-num-pack` + + = Installing Tesseract on RHEL = + 1. Add "epel" to your yum repositories if it isn't already installed + + 1a. `wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm` (or appropriate version) + + 1b. `rpm -Uvh epel-release-latest-7.noarch.rpm` + + 2. `yum install tesseract` + 3. To add language packs, see what's available `yum search tesseract` then, e.g. `yum install tesseract-langpack-ara` + + = Installing Tesseract on Windows = + See [[https://github.com/UB-Mannheim/tesseract/wiki|UB-Mannheim]]. = Using Tika and Tesseract =
