On 1/23/22 10:16, Paul Koning via cctalk wrote:
> Maybe. But OCR programs have had learning features for decades. I've spent > quite a lot of time in FineReader learning mode. Material produced on a > moderate-quality typewriter, like the CDC 6600 wire lists on Bitsavers, can > be handled tolerably well. Especially with post-processing that knows what > the text patterns should be and converts common misreadings to what they > should be. But the listings I mentioned before were entirely unmanageable > even after a lot of "learning mode" effort. An annoying wrinkle was that I > wasn't dealing with greenbar but rather with Dutch line printer paper that > has every other line marked with 5 thin horizontal lines, almost like music > score paper. Faded printout with a worn ribbon on a substrate like that is a > challenge even for human eyeballs, and all the "machine learning" hype can't > conceal the fact that no machine can come anywhere close to a human for > dealing with image recognition under tough conditions. The problem is that OCR needs to be 100% accuracy for many purposes. Much short of that requires that the result be inspected by hand line-by-line with the knowledge of what makes sense. Mistaking a single fuzzy 8 for a 6 or a 3, for example can render code inoperative with a very difficult to locate bug. Perhaps an AI might be programmed to separate out the nonsense typos. Old high-speed line printers weren't always wonderful with timing the hammer strikes. I recall some nearly impossible to read Univac 1108 engineering documents, printed on a drum printer. Gave me headaches. At least that's my take. --Chuck
