Hi everyone, Here is my list of objectives for the meet.
Objectives/Deliverables for Indic Meet I shall first demonstrate the working of the OCR on some sample images. Then I plan to explain the working of the OCR system on a higher level. It shall be followed by a demonstration of the problems that exist in the present system and potential solutions that I have in mind. I shall demonstrate how to train this OCR for a particular language. This should be over in 75 minutes. Then we move on to the problems I am facing. We have a discussion on possible solutions. Here are a few problems to tackle: 1) Learning about the various efforts made in the past. BOCRA / Aksharbodh etc 2) Dealing with the post-OCR spell-checker problem 3) A better segmentation algorithm. Ocropus Curved cut segmenter. Merits/demerits 3) Reducing number of character classes to be trained as explained at http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html 4) Talk to Santhosh Thottingal about integrating the service to Silpa 5) How to build a web interface that can train the OCR engine from user input. Taken from http://hacking-tesseract.blogspot.com/2009/05/issues-for-indic-meet.html -- Regards, Debayan Banerjee Support Free Software http://deeproot.in ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ IndLinux-group mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/indlinux-group
