Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "PDFParser (Apache PDFBox)" page has been changed by TimothyAllison: https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29?action=diff&rev1=3&rev2=4 == OCR == - Start with the instructions on [[TikaOCR|https://wiki.apache.org/tika/TikaOCR]]. In short, you need to have Tesseract installed. + Start with the instructions on [[https://wiki.apache.org/tika/TikaOCR|TikaOCR]]. In short, you need to have Tesseract installed. There are two ways of running OCR on PDFs: 1. Extracting the inline images and letting Tesseract run on those.
