Re: [CODE4LIB] Scanned PDF to text

Kyle Banerjee Tue, 09 Dec 2014 08:47:02 -0800

> I’m not quite sure if I understand the question, but if all you want to do is 
> pull the text out of an OCR’ed PDF file, then I have found both Tika and 
> PDFtotext to be useful tools....
> 
> On the other hand, if you need to do the OCR itself, then employing Tesseract 
> is probably the way to go.


For clarity, I have to do the OCR itself. I've been using CAM::PDF to extract 
existing text.

Kyle

Re: [CODE4LIB] Scanned PDF to text

Reply via email to