On 4/6/06, Golam Mortuza Hossain <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to ask one question. During OCR operation (assuming > I have already trained it) what should one do if there is an > illustration (non-text element) in the scanned image? > > For example, it is pretty common to have pictures of King/Queen > within the text in History books.
Ideally, the software should automatically figure out which part is text and which part is not (and there is some literature, possibly even some other open source OCR implementations, that discuss how to do that). Right now, BOCRA is nowhere near that sophisticated. In fact, it will get confused even by two column text. Currently I do this segmentation (as well as skew correction etc) manually when scanning (or just after scanning). In some (distant) future version of BOCRA all this would hopefully be automated. Incidentally, if anyone's interested in the theory behind the approach (and doesn't want to figure it out from the source code), I have a write-up. I'm not quite ready to release it publicly yet, but I'm happy to send it to anyone off-list upon request. Deepayan