We have used in the past OCR shop XTR from Vividata under linux
(command-line utility, not API). It can be nicely scripted under linux
and has given us decent results, despite some quirks, which may have now
been taken care of since our product was purchased in 2003.
It looks like they now offer a version with image-over-text PDF output:
http://vividata.com/ocr_comparison.html
-- Alberto
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
James Tuttle
Sent: Friday, October 17, 2008 7:57 AM
To: [email protected]
Subject: [CODE4LIB] OCR PDFs
I wonder if any of you might have experience with creating text PDFs
from TIFFs. I've been using tiffcp to stitch TIFFs together into a
single image and then using tiff2pdf to generate PDFs from the single
TIFF. I've had to pass this image-based PDF to someone with Acrobat to
use it's batch processing facility to OCR the text and save a text-based
PDF. I wonder if anyone has suggestions for software I can integrate
into the script (Python on Linux) I'm using.
Thanks,
James
--
Dr. Alberto Accomazzi aaccomazzi(at)cfa harvard edu
Project Manager
NASA Astrophysics Data System ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu
60 Garden St, MS 67, Cambridge, MA 02138, USA