[doc] Tesseract OCR engine adds Arabic support

Khaled Hosny Wed, 16 Nov 2011 07:37:52 -0800

Hello all,

Tesseract[1] 3.01, released last October, include an Arabic recogniser.


I just tested it with a page scanned of an old book typeset in Naskh (a
fairly complex, but common font), I got ~80% of the words recognised
correctly. Most of the badly recognised words contain diacritics or dots
which seem to confuse it. I think training could improve it (though I've
no experience with Tesseract), but there is no training module for the
Arabic recogniser yet (per the release note[2]).

I thought this would be of interested to people here.

[1] http://code.google.com/p/tesseract-ocr/
[2] http://code.google.com/p/tesseract-ocr/wiki/ReleaseNotes

Regards,
 Khaled
_______________________________________________
Doc mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/doc

[doc] Tesseract OCR engine adds Arabic support

رد على