-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, I've tried tesseract and found it to be pretty accurate, but I don't believe there is a way to integrate the text back into the PDF. It's easy to pull text out of image-based PDFs, but not to put the text back in. Driving me crazy...
Thanks for tips, James Bridger Dyson-Smith wrote: > If you haven't already, take a look at tesseract ( > http://code.google.com/p/tesseract-ocr/). There's some discussion of using > tesseract and shell scripting to work with tiffs to pdfs to ocr'd text, > which isn't exactly what you're wanting to do, I know, but may prove helpful > (http://www.groklaw.net/articlebasic.php?story=20061210115516438). > Cheers! > Bridger Dyson-Smith > > > On Fri, Oct 17, 2008 at 8:28 AM, Terry Harrison <[EMAIL PROTECTED]> wrote: > >> You might want to look at ABBYY Fine Reader 9.0 Professional, which can be >> driven from the command line. Fine Reader is used at the Library of >> Congress. Here is a info link to get you started (search "command"): >> >> >> http://www.scanstore.com/Scanning/Document_Imaging/Software/OCR_Software/Nuance/omnipage_review.asp >> >> Regards, >> Terry >> >> ------------------------------------ >> Terry Harrison >> Project Manager >> CACI >> 5505 Robin Hood Road, Suite F >> Norfolk, Va. 23508 >> Ph: 757.321.9120 x232 >> Fax: 757.321.8797 >> [EMAIL PROTECTED] >> - -- - ------------------------------- James Tuttle Digital Repository Librarian NCSU Libraries, Box 7111 North Carolina State University Raleigh, NC 27695-7111 [EMAIL PROTECTED] (919)513-0651 Phone (919)515-3031 Fax -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI+QuEKxpLzx+LOWMRAhSyAJ9+lQ/1J5SP/23XQrVrlsoNRZyKxQCfYTGw qUBK6A9mkiLy88buUz7Wngg= =DyZk -----END PGP SIGNATURE-----