[Ankur-core] Re: announcement: BOCRA - an OCR software for Bangla etc.

Deepayan Sarkar Thu, 06 Apr 2006 09:52:08 -0700

On 4/6/06, Golam Mortuza Hossain <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I would like to ask one question. During OCR operation (assuming
> I have already trained it) what should one do if there is an
> illustration (non-text element) in the scanned image?
>
> For example, it is pretty common to have pictures of King/Queen
> within the text in History books.


Ideally, the software should automatically figure out which part is
text and which part is not (and there is some literature, possibly
even some other open source OCR implementations, that discuss how to
do that). Right now, BOCRA is nowhere near that sophisticated. In
fact, it will get confused even by two column text.

Currently I do this segmentation (as well as skew correction etc)
manually when scanning (or just after scanning). In some (distant)
future version of BOCRA all this would hopefully be automated.

Incidentally, if anyone's interested in the theory behind the approach
(and doesn't want to figure it out from the source code), I have a
write-up. I'm not quite ready to release it publicly yet, but I'm
happy to send it to anyone off-list upon request.

Deepayan

[Ankur-core] Re: announcement: BOCRA - an OCR software for Bangla etc.

Reply via email to