On Fri, 17 Oct 2008, James Tuttle wrote:
I wonder if any of you might have experience with creating text PDFs from TIFFs. I've been using tiffcp to stitch TIFFs together into a single image and then using tiff2pdf to generate PDFs from the single TIFF. I've had to pass this image-based PDF to someone with Acrobat to use it's batch processing facility to OCR the text and save a text-based PDF. I wonder if anyone has suggestions for software I can integrate into the script (Python on Linux) I'm using.
I don't, but I've used the batch processing of Acrobat before to do the OCR -- and let me suggest that you make sure to back up the files before running the batch.
I selected the wrong option, and instead of ending up with image+text, it stripped out the image, and saved overtop of the original files. (wiping out a week's worth of scanning for me)
I've also never found a good way of editing the 'tags' that Acrobat generates -- so it marks up each line of the document as a new paragraph and I couldn't find any good tools to merge the tags (although, I was running an older version of Acrobat ... 6, I think)
-Joe
