> > Have you seen what the off the shelf OCR systems like OmniPage do these > > days? > >Yes -- the performance is awful. And that's on ordinary printed text >that's supposed to be readable, not on text that has been intentionally >obfuscated.
My experience is otherwise; I use OmniPage 7.0--an old version on our Macs here at the school--to OCR out-of-copyright texts for placement on-line. All of these books are old, and many are dirty, ripped, and/or faded. Some are in strange fonts, tiny fontsizes, and multiple styles. Sometimes I can even see the text on the other side of the paper. OmniPage not only gets the correct text (sometimes text that I wasn't even sure about until I saw OmniPage's "guess"), but it also keeps the italicization, bolding, subscripts, and superscripts. It recognizes columns, and even recognizes and automatically reorients when I accidentally put the book in upside down. And this is from 1997 or earlier technology! Scanning has come a *long* way from the old Kurzweil washing-machine that we used to scan Freud's text back in the eighties. Jerry -- [EMAIL PROTECTED] http://www.sandiego.edu/~jerry/ Serra 188B/x8773 -- The more restrictions there are, the poorer the people become. The greater the government's power, the more chaotic the nation would become. The more the ruler imposes laws and prohibitions on his people, the more frequently evil deeds would occur. --The Silence of the Wise: The Sayings of Lao Zi _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman-21/listinfo/mailman-developers
