Funny, I think me, linuxz and oomlx had a talk about it at #arabeyes.
There has been a paper published outlining some nice methods for arabic OCR,
but the math of it is just beyond me.

here is the paper:
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-495.pdf

On 9/25/07, Mohamed Magdy <[EMAIL PROTECTED]> wrote:
>
> Salam
>
> As mentioned earlier
> http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html
>
> It may be worthwhile and faster if Arabic support is implemented into
> Tesseract-ocr ..
>
> The important thing is the support of unicode.. tesseract 2.0
> http://code.google.com/p/tesseract-ocr/ can use and understand unicode
> and could be trained for any language that don't have its characters
> joined..
>
> What it is lacking is mentioned in the training page :
>
> > Tesseract can only handle left-to-right languages. While you can get
> > something out with a right-to-left language, the output file will be
> > ordered as if the text were left-to-right. Top-to-bottom languages
> > will currently be hopeless.
> >
> > Tesseract is unlikely to be able to handle connected scripts like
> > Arabic. It will take some specialized algorithms to handle this case,
> > and right now it doesn't have them.
> >
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>
> I did a very very simple test :
>
> http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab
>
> If you could help, please please do so.
>
> Note:- As far as I know, right now..there is NO working Arabic-capable
> OCR engine.. free or otherwise.. I doubt if Sahkr software can detect
> anything.
>
> --alnokta
> _______________________________________________
> Developer mailing list
> [email protected]
> http://lists.arabeyes.org/mailman/listinfo/developer
_______________________________________________
Developer mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/developer

رد على