[Tika Wiki] Update of "PDFParser (Apache PDFBox)" by TimothyAllison

Apache Wiki Wed, 09 Nov 2016 07:46:56 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.


The "PDFParser (Apache PDFBox)" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29?action=diff&rev1=3&rev2=4

  
  
  == OCR ==
- Start with the instructions on 
[[TikaOCR|https://wiki.apache.org/tika/TikaOCR]].  In short, you need to have 
Tesseract installed.
+ Start with the instructions on 
[[https://wiki.apache.org/tika/TikaOCR|TikaOCR]].  In short, you need to have 
Tesseract installed.
  
  There are two ways of running OCR on PDFs:
   1. Extracting the inline images and letting Tesseract run on those.

[Tika Wiki] Update of "PDFParser (Apache PDFBox)" by TimothyAllison

Reply via email to