Re: [Ankur-core] Bangla OCR progress
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sayamindu Dasgupta wrote: | This guy seems to be doing some interesting progress for a Bangla OCR | - or more precisely, enabling Bangla in Tesseract. | http://debayanin.googlepages.com/hackingtesseract | Looks like he needs some more training data - can we provide him with some ? As an aside, he is working with the Swatantra Malayalam Computing group to fix OCR issues in ml_IN too. And, I'd request someone to validate how much progress he is making in terms of attaining accuracy. - -- You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw www.linkedin.com/in/sankarshan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkhrhSkACgkQXQZpNTcrCzOCZACgjLgyl75jk88pAnNcJqki8/zL 2YsAoIxueuNMbpoCKIK8yXFBVF1gr0M9 =S+gd -END PGP SIGNATURE- - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core
Re: [Ankur-core] Bangla OCR progress
On Wed, Jul 2, 2008 at 9:32 AM, Sayamindu Dasgupta [EMAIL PROTECTED] This guy seems to be doing some interesting progress for a Bangla OCR - or more precisely, enabling Bangla in Tesseract. http://debayanin.googlepages.com/hackingtesseract Yes, it looks definitely interesting. Looks like he needs some more training data - can we provide him with some ? If I remember correctly, there was a sample file for testing completeness of Bengali fonts. Since it has all letters and conjuncts typed-in, the file might be useful for training Tesseract as well . Deepayan should be able to give some input here. He has working experience with R and may have some training sample as well. Cheers, Golam -- http://gravity.psu.edu/~hossain/ - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core
Re: [Ankur-core] Bangla OCR progress
On 7/2/08, Golam Mortuza Hossain [EMAIL PROTECTED] wrote: On Wed, Jul 2, 2008 at 9:32 AM, Sayamindu Dasgupta [EMAIL PROTECTED] This guy seems to be doing some interesting progress for a Bangla OCR - or more precisely, enabling Bangla in Tesseract. http://debayanin.googlepages.com/hackingtesseract Cool. I had some interaction with the tesseract/ocropus folks, and it sounded like a good base. It's nice that someone's actually doing something with it. It takes the old matra removal approach, and he's facing the same problems I did (notice in his first example that গ is segmented into 2 parts, and শু is not). On the other hand, having something that works even partly is a good start. Yes, it looks definitely interesting. Looks like he needs some more training data - can we provide him with some ? If I remember correctly, there was a sample file for testing completeness of Bengali fonts. Since it has all letters and conjuncts typed-in, the file might be useful for training Tesseract as well . Deepayan should be able to give some input here. He has working experience with R and may have some training sample as well. Well, we have a bunch of unicode documents. For some of them, I have print versions too, and can scan them if needed. A simpler approach would be to render them using different fonts and take screenshots. Apparently he also needs some box-files, whatever they are, which need to be produced using tesseract. I haven't installed tesseract yet, and will try, but let me know if anyone else manages. -Deepayan - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core