On 5/9/09, Debayan Banerjee <debaya...@gmail.com> wrote: > 2009/5/9 Deepayan Sarkar <deepayan.sar...@gmail.com>: > > > Debayan, > > > > I have been meaning to ask you: is your character segmentation > > algorithm in a form that could be easily separated out? > > The segmentation algorithm can be found here > (http://tesseractindic.googlecode.com/files/clipmatra_pseudocode.pdf)
But this is your original algorithm which segmented গ etc (at least for some fonts), isn't it? I thought you had an improved algorithm which works around some of those problems (or maybe I misunderstood your mail). > > If it could be > > easily done, I would like to try it out in BOCRA. Unfortunately, I > > don't think I will have enough time in the near future to figure out > > how ocropus/tesseract does things. > > > Kindly read the paragraph in this > > (http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html) > > post regarding reducing number of character classes to be trained. I > want to know if this is possible using BOCRA. No it's not. From the beginning, my design for BOCRA was based on the idea of on-the-fly training, because that's the only approach I thought was feasible given the combination of non-standard fonts and so many potential conjuncts. In most realistic examples, the number of conjuncts is actually quite limited. After accounting for the most common ones, the frequency of the rest are probably lower than normal OCR error rate anyway. -Deepayan ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core