Re: [CODE4LIB] OCR PDFs

Alberto Accomazzi Tue, 21 Oct 2008 11:50:09 -0700

We have used in the past OCR shop XTR from Vividata under linux(command-line utility, not API). It can be nicely scripted under linuxand has given us decent results, despite some quirks, which may have nowbeen taken care of since our product was purchased in 2003.


It looks like they now offer a version with image-over-text PDF output:
http://vividata.com/ocr_comparison.html


-- Alberto

From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
James Tuttle
Sent: Friday, October 17, 2008 7:57 AM
To: [email protected]
Subject: [CODE4LIB] OCR PDFs

I wonder if any of you might have experience with creating text PDFs
from  TIFFs.  I've been using tiffcp to stitch TIFFs together into a
single image and then using tiff2pdf to generate PDFs from the single
TIFF.  I've had to pass this image-based PDF to someone with Acrobat to
use it's batch processing facility to OCR the text and save a text-based
PDF.  I wonder if anyone has suggestions for software I can integrate
into the script (Python on Linux) I'm using.

Thanks,
James


--
Dr. Alberto Accomazzi                  aaccomazzi(at)cfa harvard edu
Project Manager
NASA Astrophysics Data System                        ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics      www.cfa.harvard.edu
60 Garden St, MS 67, Cambridge, MA 02138, USA

Re: [CODE4LIB] OCR PDFs

Reply via email to