> On Nov 20, 2021, at 11:15 AM, Jon Elson via cctalk <[email protected]>
> wrote:
>
> On 11/20/21 1:30 AM, Joerg Hoppe via cctalk wrote:
>> Hi Friends,
>>
>> Micro fiche scans of the PDP-11 XXDP listings are online now:
>
> Wow, took a quick look. The scans are likely not good enough to run through
> an OCR program, but certainly good enough to read through when trying to
> understand what a program is doing.
I only tried tesseract once, years ago, and it wasn't useful at all for the
particular material I gave it. Quite possibly it's better now.
Instead, I ended up buying a commercial OCR program, "Fine Reader" from ABBYY,
which has served me well. I used it to read CDC 6600 wire list scans, which it
did well. I also tried to make it do the THE source listings in the Knuth
archive; those are hopeless for OCR partly due to the overprinting convention
used, and required manual entry.
So... it might be worth a try feeding some of those images to current
commercial OCR programs. FineReader has a "learn" capability that does a
decent job of making it deal with the peculiarities of a particular piece of
source material.
paul