On Tue, 17 Aug 2010 15:57:36 +0200, Antonio Diaz Diaz wrote:

>Tilman Hausherr wrote:
>> Why not accept that some images might really have some very high and
>> very small characters? Its not that unlikely, e.g. with advertisements:
>> "free beer coupon *" in huge characters, and "* not valid in
>> Lampukistan" in very small characters. If you make a real change,
>> there's always the risk that you'd get worse results for the majority of
>> images while solving a problem that almost never happens. Maybe a
>> solution would be that if there are no medium characters, to just add
>> one element that produces a space...
>
>You mean if the high characters are grouped put them in a line and the 
>short characters in another line? I guess this can be implemented.

Yes, although my text above is purely theoretical. Currently I have
concentrated on processing the results that I get; I'd done almost no
evaluation about the quality of the OCR.

>> On the other hand, I just thought of another "symptom" fix, and it
>> works:
>
>Yes, this works, but as a definitive solution I prefer to remove lines 
>which only contain noise.

Yeah, that would be nice.

I have observed - although not yet researched fully - that sometimes,
noise lines between "good" text lines ==> this text not being ocred at
all. This happens with images that have grey areas, and these areas,
when scanned, sometimes look like a chess board. But I need to do more
research there.

Tilman

>
>
>Regards,
>Antonio.
>
>_______________________________________________
>Bug-ocrad mailing list
>Bug-ocrad@gnu.org
>http://lists.gnu.org/mailman/listinfo/bug-ocrad

_______________________________________________
Bug-ocrad mailing list
Bug-ocrad@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-ocrad

Reply via email to