> On Nov 20, 2021, at 11:15 AM, Jon Elson via cctalk <[email protected]> 
> wrote:
> 
> On 11/20/21 1:30 AM, Joerg Hoppe via cctalk wrote:
>> Hi Friends,
>> 
>> Micro fiche scans of the PDP-11 XXDP listings are online now:
> 
> Wow, took a quick look.  The scans are likely not good enough to run through 
> an OCR program, but certainly good enough to read through when trying to 
> understand what a program is doing.

I only tried tesseract once, years ago, and it wasn't useful at all for the 
particular material I gave it.  Quite possibly it's better now.

Instead, I ended up buying a commercial OCR program, "Fine Reader" from ABBYY, 
which has served me well.  I used it to read CDC 6600 wire list scans, which it 
did well.  I also tried to make it do the THE source listings in the Knuth 
archive; those are hopeless for OCR partly due to the overprinting convention 
used, and required manual entry.

So... it might be worth a try feeding some of those images to current 
commercial OCR programs.  FineReader has a "learn" capability that does a 
decent job of making it deal with the peculiarities of a particular piece of 
source material.

        paul


Reply via email to