On Tue, 17 Aug 2010 15:57:36 +0200, Antonio Diaz Diaz wrote: >Tilman Hausherr wrote: >> Why not accept that some images might really have some very high and >> very small characters? Its not that unlikely, e.g. with advertisements: >> "free beer coupon *" in huge characters, and "* not valid in >> Lampukistan" in very small characters. If you make a real change, >> there's always the risk that you'd get worse results for the majority of >> images while solving a problem that almost never happens. Maybe a >> solution would be that if there are no medium characters, to just add >> one element that produces a space... > >You mean if the high characters are grouped put them in a line and the >short characters in another line? I guess this can be implemented.
Yes, although my text above is purely theoretical. Currently I have concentrated on processing the results that I get; I'd done almost no evaluation about the quality of the OCR. >> On the other hand, I just thought of another "symptom" fix, and it >> works: > >Yes, this works, but as a definitive solution I prefer to remove lines >which only contain noise. Yeah, that would be nice. I have observed - although not yet researched fully - that sometimes, noise lines between "good" text lines ==> this text not being ocred at all. This happens with images that have grey areas, and these areas, when scanned, sometimes look like a chess board. But I need to do more research there. Tilman > > >Regards, >Antonio. > >_______________________________________________ >Bug-ocrad mailing list >Bug-ocrad@gnu.org >http://lists.gnu.org/mailman/listinfo/bug-ocrad _______________________________________________ Bug-ocrad mailing list Bug-ocrad@gnu.org http://lists.gnu.org/mailman/listinfo/bug-ocrad