Salam

As mentioned earlier 
http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html

It may be worthwhile and faster if Arabic support is implemented into 
Tesseract-ocr ..

The important thing is the support of unicode.. tesseract 2.0 
http://code.google.com/p/tesseract-ocr/ can use and understand unicode 
and could be trained for any language that don't have its characters 
joined..

What it is lacking is mentioned in the training page :

> Tesseract can only handle left-to-right languages. While you can get 
> something out with a right-to-left language, the output file will be 
> ordered as if the text were left-to-right. Top-to-bottom languages 
> will currently be hopeless.
>
> Tesseract is unlikely to be able to handle connected scripts like 
> Arabic. It will take some specialized algorithms to handle this case, 
> and right now it doesn't have them.
>
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract

I did a very very simple test :
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab

If you could help, please please do so.

Note:- As far as I know, right now..there is NO working Arabic-capable 
OCR engine.. free or otherwise.. I doubt if Sahkr software can detect 
anything.

--alnokta
_______________________________________________
Developer mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/developer

رد على