> On Dec 3, 2025, at 10:55 AM, Adrian Godwin via cctalk <[email protected]>
> wrote:
>
> I don't think it's the general quality of the patent print that's poor,
> it's the line-printer listing section from
> https://www.hp9845.net/9845/downloads/patents/US4089059.pdf starting at
> about page 213 of the pdf , possibly section 26 of the patent.
>
> The print in that section is much paler than the rest - typical of a worn
> line-printer ribbon. I doubt the printed copy is any better. I'm only
> trying to OCR the listing, not the rest of the patent.
That's quite a cleaen listing, actually, cleaner than most I have worked with
and dramatically better than some. The sort of slightly-damaged characters
that appear should be no problem at all for the "training" feature of ABBYY
Fine Reader to deal with. What you'd have to do is run a number of pages
through it in training mode, so it sees a number of variations of the
individual characters. And as I mentioned, you'd do all the scanning in the
mode where it only accepts what it was trained with, no "builtin" patterns.
That way it won't make up stuff that isn't part of the character set but
happens to match something built-in, like a pound-sterling sign.
It may be that scanning the listing as a table (with the various columns as
table columns) will work well, and give you the layout explicitly. Or it can
be scanned as plain text, but in that case the spacing will mostly turn into
individual spaces and you'd need post-processing to insert tabs etc. to make it
look right again. Given the simple assembler syntax involved that sort of
post-processing would not be hard.
paul