Re: [Ankur-core] Bangla OCR progress

2008-07-02 Thread Sankarshan (সঙ্কর্ষণ)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sayamindu Dasgupta wrote:

| This guy seems to be doing some interesting progress for a Bangla OCR
| - or more precisely, enabling Bangla in Tesseract.
| http://debayanin.googlepages.com/hackingtesseract
| Looks like he needs some more training data - can we provide him with
some ?

As an aside, he is working with the Swatantra Malayalam Computing group
to fix OCR issues in ml_IN too.

And, I'd request someone to validate how much progress he is making in
terms of attaining accuracy.



- --



You see things; and you say 'Why?';
But I dream things that never were;
and I say 'Why not?' - George Bernard Shaw
www.linkedin.com/in/sankarshan



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkhrhSkACgkQXQZpNTcrCzOCZACgjLgyl75jk88pAnNcJqki8/zL
2YsAoIxueuNMbpoCKIK8yXFBVF1gr0M9
=S+gd
-END PGP SIGNATURE-

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] Bangla OCR progress

2008-07-02 Thread Golam Mortuza Hossain
On Wed, Jul 2, 2008 at 9:32 AM, Sayamindu Dasgupta [EMAIL PROTECTED]
 This guy seems to be doing some interesting progress for a Bangla OCR
 - or more precisely, enabling Bangla in Tesseract.
 http://debayanin.googlepages.com/hackingtesseract

Yes, it looks definitely interesting.

 Looks like he needs some more training data - can we provide him with some ?

If I remember correctly, there was a sample file for testing completeness
of Bengali fonts. Since it has all letters and conjuncts typed-in, the
file might
be useful for training Tesseract as well .

Deepayan should be able to give some input here. He has working experience
with R and may have some training sample as well.

Cheers,
Golam

--
http://gravity.psu.edu/~hossain/

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] Bangla OCR progress

2008-07-02 Thread Deepayan Sarkar
On 7/2/08, Golam Mortuza Hossain [EMAIL PROTECTED] wrote:
 On Wed, Jul 2, 2008 at 9:32 AM, Sayamindu Dasgupta [EMAIL PROTECTED]

  This guy seems to be doing some interesting progress for a Bangla OCR
   - or more precisely, enabling Bangla in Tesseract.
   http://debayanin.googlepages.com/hackingtesseract

Cool. I had some interaction with the tesseract/ocropus folks, and it
sounded like a good base. It's nice that someone's actually doing
something with it. It takes the old matra removal approach, and he's
facing the same problems I did (notice in his first example that গ is
segmented into 2 parts, and শু is not). On the other hand, having
something that works even partly is a good start.

 Yes, it looks definitely interesting.

   Looks like he needs some more training data - can we provide him with some 
 ?

 If I remember correctly, there was a sample file for testing completeness
  of Bengali fonts. Since it has all letters and conjuncts typed-in, the
  file might
  be useful for training Tesseract as well .

  Deepayan should be able to give some input here. He has working experience
  with R and may have some training sample as well.

Well, we have a bunch of unicode documents. For some of them, I have
print versions too, and can scan them if needed. A simpler approach
would be to render them using different fonts and take screenshots.

Apparently he also needs some box-files, whatever they are, which need
to be produced using tesseract. I haven't installed tesseract yet, and
will try, but let me know if anyone else manages.

-Deepayan
-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core