OCR itself is very prone errors. I've had good experience using transformation to lower the color space of images. I wonder why Tesseract doesn't do this itself.
As for training, as far as I know Tesseract can be trained. Don't know the process. I think that language files for Tesseract are actually training files. Some links I found on the topic: http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03–3.05 On Wednesday, February 14, 2018 at 9:33:05 PM UTC-4, David Reagan wrote: > > While experimenting with Mayan, I've noticed that the OCR is pretty > unreliable. > > CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER, > UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc. > > Those are all examples on just one receipt. And the preview is pretty darn > good looking. > > So, is there a way to teach the OCR to get better? > > Or some other way to improve OCR results? Maybe a newer version of > Tesseract? > > > -- --- You received this message because you are subscribed to the Google Groups "Mayan EDMS" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
