I discovered Tesseract, and found to my delight that a package already
exists in Ubuntu Feisty for it. However, this works poorly on color
or grayscale. The ImageMagick covert program does a woeful work of 
converting documents to black and white(not grayscale) before the
OCR program works on it.

I have tried converting the files to B/W TIFF myself, using Gimp. But
no matter how much I fiddle around with contrast settings, etc., the 
black and white images are pathetic. I cannot believe that conversion of
grayscale images to black/white cannot be better than this.

Any recommendations from others who have already gone through this agony?
How can we improve the input images of OCR programs like tesseract?

Thanks,
  Sandip

_______________________________________________
ilugd mailinglist -- [email protected]
http://frodo.hserus.net/mailman/listinfo/ilugd
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/[email protected]/

Reply via email to