I've run into that situation too, with listings so difficult that even a 
commercial OCR program (FineReader) couldn't handle it.  At the time Tesseract 
was far less capable, though I haven't tried it recently to see if that has 
changed.

Anyway, my experience was that the task was hard enough that it needed someone 
with knowledge of the material.  It may be a contract typist could do a 
tolerable job but I have my doubts.  Typing, say, an obsolete assembly language 
program if you see it merely as a random collection of characters is going to 
produce more errors than if the person doing the typing actually understands 
what the material means.

One consideration is the effort required to repair transcription errors.  Those 
that produce syntax errors aren't such an issue; those that pass the assembler 
or compiler but result in bugs (say, a mistyped register number) are harder to 
find.

        paul

> On Jan 22, 2022, at 8:57 PM, Mark Kahrs via cctalk <[email protected]> 
> wrote:
> 
> No, OCR totally fails on olde line printer listing.  At least the ones I've
> tried (tesseract, online, ...)
> 
> 
> 
> On Sat, Jan 22, 2022 at 8:06 PM Ethan O'Toole <[email protected]> wrote:
> 
>> 
>> Can the listings be OCR'ed?
>> 
>>                        - Ethan
>> 
>> 
>>> Has anyone ever used Amazon Mechanical Turk to employ typists to type in
>>> old listings of lost code?
>>> 
>>> Asking for a friend.

Reply via email to