thank you, this is very good news, > I think training could improve it (though I've > no experience with Tesseract), but there is no training module for the > Arabic recogniser yet (per the release note[2]).
they says Added Cube, a new recognizer for Arabic. Cube can also be used in combination with normal Tesseract for other languages with an improvement in accuracy at the cost of (much) lower speed. There is no training module for Cube yet. maybe it can NOT be trained by design On Wed, Nov 16, 2011 at 5:37 PM, Khaled Hosny <[email protected]> wrote: > > Hello all, > > Tesseract[1] 3.01, released last October, include an Arabic recogniser. > > I just tested it with a page scanned of an old book typeset in Naskh (a > fairly complex, but common font), I got ~80% of the words recognised > correctly. Most of the badly recognised words contain diacritics or dots > which seem to confuse it. > > I thought this would be of interested to people here. > > [1] http://code.google.com/p/tesseract-ocr/ > [2] http://code.google.com/p/tesseract-ocr/wiki/ReleaseNotes > > Regards, > Khaled > _______________________________________________ > Doc mailing list > [email protected] > http://lists.arabeyes.org/mailman/listinfo/doc _______________________________________________ Developer mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/developer

