I discovered Tesseract, and found to my delight that a package already exists in Ubuntu Feisty for it. However, this works poorly on color or grayscale. The ImageMagick covert program does a woeful work of converting documents to black and white(not grayscale) before the OCR program works on it.
I have tried converting the files to B/W TIFF myself, using Gimp. But no matter how much I fiddle around with contrast settings, etc., the black and white images are pathetic. I cannot believe that conversion of grayscale images to black/white cannot be better than this. Any recommendations from others who have already gone through this agony? How can we improve the input images of OCR programs like tesseract? Thanks, Sandip _______________________________________________ ilugd mailinglist -- [email protected] http://frodo.hserus.net/mailman/listinfo/ilugd Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi http://www.mail-archive.com/[email protected]/
