Salam As mentioned earlier http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html
It may be worthwhile and faster if Arabic support is implemented into Tesseract-ocr .. The important thing is the support of unicode.. tesseract 2.0 http://code.google.com/p/tesseract-ocr/ can use and understand unicode and could be trained for any language that don't have its characters joined.. What it is lacking is mentioned in the training page : > Tesseract can only handle left-to-right languages. While you can get > something out with a right-to-left language, the output file will be > ordered as if the text were left-to-right. Top-to-bottom languages > will currently be hopeless. > > Tesseract is unlikely to be able to handle connected scripts like > Arabic. It will take some specialized algorithms to handle this case, > and right now it doesn't have them. > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract I did a very very simple test : http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab If you could help, please please do so. Note:- As far as I know, right now..there is NO working Arabic-capable OCR engine.. free or otherwise.. I doubt if Sahkr software can detect anything. --alnokta _______________________________________________ Developer mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/developer

