If you are considering scanning to PDF format, you should certainly look
into Adobe's Acrobat Capture. This OCRs the scanned image (TIFF or
various other raster formats) and creates a PDF Text version either in
place of or underneath the image. If you choose not to save the scanned
version (i.e., to save substantial space) the Captured version will
preserve any text it is unable to validate against its internal
dictionary as small pictures.
The resulting Captured PDF file is still a facsimile of the original and
much smaller than the full raster image and has the added advantage that
it is text indexable and searchable.
Standard Acrobat includes a number of functions to clean up Captured
images - i.e., to locate and correct texts that capture could not
convert or flagged as failing validation against its spelling
dictionary. These require manual intervention, but provide a way to
rapidly process large volumes of scanned documentation into indexable
and validated text documents.
The OCR capabilities are quite impressive, in that even scanned faxes
were successfully converted.
My tests were done several years ago (i.e., it is now a very mature
product), but I have not yet been able to convince powers that be
internally that we should Capture all of our incoming correspondence and
other paper documentation.
Bill Hall
Documentation Systems Analyst
Strategy and Development Group
Tenix Defence
Nelson House, Nelson Place
Williamstown, Vic. 3016
Australia
Tel: +61 3 9244 4820 (Direct)
+61 3 9244 4986 (Office)
URL: http://www.tenix.com
Mailto:[EMAIL PROTECTED]
--
http://cms-list.org/
trim your replies for good karma.