I've just started playing with aspell. I'd like to use it to improve OCR
text automatically. I know the output won't be perfect, I just want it to be
better than the input. I've experimented a bit and found that by inserting
aspell's first suggestion for words it doesn't find in its dictionary, I get
a reasonably good result. It would be much better, though, if aspell's
algorithms were oriented toward the kinds of mistakes OCR engines make
rather than the kinds made by human typists. I can see how you might do this
by working with the translation tables for the phonetic code, the keyboard
files, etc. Before I go any further, I thought I'd ask whether anyone else
has already gone down this route. Does anyone have files to share, or advice
on how to proceed, or warnings to go back now?

Peter Binkley
Digital Initiatives Technology Librarian
email: [EMAIL PROTECTED]
phone: (780) 492-3743
fax: (780) 492-9243
post: Cameron Library 4-30
      University of Alberta
      Edmonton Alberta 
      Canada T6G 2J8


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
aspell-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/aspell-user

Reply via email to