[cctalk] Re: Are there any useful OCR programs for scanning old listings and producing text with proper formatting

Paul Koning via cctalk Mon, 15 May 2023 08:06:19 -0700

> On May 12, 2023, at 10:03 PM, trash80--- via cctalk <[email protected]> 
> wrote:
> 
> An issue dear to my heart - I have quite a quantity of documentation here to 
> scan so I did quite a bit of homework and testing on this. It may not exactly 
> be your issue, but I hope it helps. 
> 
> For me, the issue was the quality of the documentation - print back in those 
> days was quite variable of course and over time documents may have 
> deteriorated.
> 
> I find Adobe Acrobat the best tool using the ClearScan method for OCR. 

Admittedly it's been quite a while, but back years ago Adobe offered for a 
while a free OCR plugin for Adobe Acrobat (full edition).  I tried it on one or 
two documents -- for example a high quality scan of the Ethernet V2 spec (DIX 
spec).

It sort of worked, but the results were very bad.  No training capability, and 
the editing features were even worse than the already pathetically bad PDF 
editing features of Acrobat.

I also used it on a scan of the A10-A flight manual.  Same sort of outcome: it 
sort of worked but really poor quality.

After that experience I tried Tesseract, which at the time wasn't ready yet.  
(That was before the current neural net version.)  Ended up buying ABBYY 
FineReader, which was much better, particularly because it has a good quality 
training mechanism.  I've still encountered material so bad that it isn't 
useable but a lot of stuff, including line printer listings, it can handle well 
enough.

One of these days I should try the new Tesseract.  

        paul

[cctalk] Re: Are there any useful OCR programs for scanning old listings and producing text with proper formatting

Reply via email to