Does anyone know of any libraries that can transliterate bengali to english. There are tools to the reverse. I need this to solve the last remaining road-block in OCR. The thing is Tesseract-OCR uses a data structure called directed-acyclic-word-graph to store dictionaries for lookup. After an OCR has been performed the OCR system matches the output with entries in this d.a.w.g. file. Unfortunately the data structure is not suited to complex scripts like ours <http://groups.google.com/group/tesseract-ocr/browse_thread/thread/5495c4e348a4b272/a6dcfe5d92babb35?lnk=gst&q=dawg%2Bwieghts#a6dcfe5d92babb35>. There are 2 solutions. 1) I figure out a suitable data structure that handles Indic script and implement. 2) I transliterate the entire dictionary and the OCR output to english (26 characters instead of the 500 odd for bengali) and then match. I think this should work. Any suggestions?
[1] http://hacking-tesseract.blogspot.com/ [2] http://code.google.com/p/tesseract-ocr -- Be Intelligent, Use GNU/Linux http://debayanin.googlepages.com/ http://debayan.wordpress.com http://lug.nitdgp.ac.in ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core