OCR itself is very prone errors. I've had good experience using 
transformation to lower the color space of images. I wonder why Tesseract 
doesn't do this itself. 

As for training, as far as I know Tesseract can be trained. Don't know the 
process. I think that language files for Tesseract are actually training 
files.

Some links I found on the topic:

http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03–3.05


On Wednesday, February 14, 2018 at 9:33:05 PM UTC-4, David Reagan wrote:
>
> While experimenting with Mayan, I've noticed that the OCR is pretty 
> unreliable.
>
> CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER, 
> UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc.
>
> Those are all examples on just one receipt. And the preview is pretty darn 
> good looking.
>
> So, is there a way to teach the OCR to get better? 
>
> Or some other way to improve OCR results? Maybe a newer version of 
> Tesseract?
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to