On 5/9/09, Debayan Banerjee <debaya...@gmail.com> wrote:
> 2009/5/9 Deepayan Sarkar <deepayan.sar...@gmail.com>:
>
> > Debayan,
>  >
>  > I have been meaning to ask you: is your character segmentation
>  > algorithm in a form that could be easily separated out?
>
> The segmentation algorithm can be found here
>  (http://tesseractindic.googlecode.com/files/clipmatra_pseudocode.pdf)

But this is your original algorithm which segmented গ etc (at least
for some fonts), isn't it? I thought you had an improved algorithm
which works around some of those problems (or maybe I misunderstood
your mail).

> > If it could be
>  > easily done, I would like to try it out in BOCRA. Unfortunately, I
>  > don't think I will have enough time in the near future to figure out
>  > how ocropus/tesseract does things.
>
>
> Kindly read the paragraph in this
>
> (http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html)
>
> post regarding reducing number of character classes to be trained. I
>  want to know if this is possible using BOCRA.

No it's not. From the beginning, my design for BOCRA was based on the
idea of on-the-fly training, because that's the only approach I
thought was feasible given the combination of non-standard fonts and
so many potential conjuncts. In most realistic examples, the number of
conjuncts is actually quite limited. After accounting for the most
common ones, the frequency of the rest are probably lower than normal
OCR error rate anyway.

-Deepayan

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core

Reply via email to