Re: [developers] [doc] Tesseract OCR engine adds Arabic support

Muayyad AlSadi Wed, 16 Nov 2011 07:45:34 -0800

thank you, this is very good news,

> I think training could improve it (though I've
> no experience with Tesseract), but there is no training module for the
> Arabic recogniser yet (per the release note[2]).


they says

Added Cube, a new recognizer for Arabic. Cube can also be used in
combination with normal Tesseract for other languages with an
improvement in accuracy at the cost of (much) lower speed. There is no
training module for Cube yet.

maybe it can NOT be trained by design


On Wed, Nov 16, 2011 at 5:37 PM, Khaled Hosny <[email protected]> wrote:
>
> Hello all,
>
> Tesseract[1] 3.01, released last October, include an Arabic recogniser.
>
> I just tested it with a page scanned of an old book typeset in Naskh (a
> fairly complex, but common font), I got ~80% of the words recognised
> correctly. Most of the badly recognised words contain diacritics or dots
> which seem to confuse it.
>
> I thought this would be of interested to people here.
>
> [1] http://code.google.com/p/tesseract-ocr/
> [2] http://code.google.com/p/tesseract-ocr/wiki/ReleaseNotes
>
> Regards,
>  Khaled
> _______________________________________________
> Doc mailing list
> [email protected]
> http://lists.arabeyes.org/mailman/listinfo/doc
_______________________________________________
Developer mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/developer

Re: [developers] [doc] Tesseract OCR engine adds Arabic support

رد على