Yes, ocrad will have some day options like "--charset=numeric", or,
The quick-fix is of course to make a "soft sub" of the result if you know
you can expect numeric characters only. Simply make a "search-and-replace"
for common letters that you know should be digits: replace every I,l,i with
1 (one), O,o,Q with 0 (zero) etc.
The best solution (I think) would be to introduce different classifiers in
the rec. engine... but that's a project in the distance future I suppose...
/Tobias A
From: Antonio Diaz Diaz <[EMAIL PROTECTED]>
To: [email protected]
CC: Manfred Schwarb <[EMAIL PROTECTED]>
Subject: [Bug-ocrad] Re: Feature request: numeric charset
Date: Wed, 08 Jun 2005 15:46:57 +0200
Hello manfred.
Yes, ocrad will have some day options like "--charset=numeric", or, for
texts without numbers, "--charset=alphabetic". Also an user-defined charset
will probably be implemented.
Of course, it will be implemented sooner if someone offers to sponsor it.
;-)
Regards,
Antonio.
Manfred Schwarb wrote:
trying to recognize numbers in tables, I stumbled across
the usual OCR hassle:
Zero is recognized as "O" or "o", One is recognized as lowercase "L" or
uppercase "i".
I think ocrad is doing it's best, and the results are great.
Nevertheless there are such mis-recognitions, inevitable, I think.
This could be avoided it there is a "--charset=numbers" or similar,
which restricts the charset to [0123456789], and perhaps [+-].
Alternatively, one could even think of an option
--charset="0123456789", i.e. a list of characters out of the
ascii character set.
What do you think?
_______________________________________________
Bug-ocrad mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-ocrad
_________________________________________________________________
Chatt: Träffa nya nätkompisar på Habbo Hotel
http://habbohotel.msn.se/habbo/sv/channelizer
_______________________________________________
Bug-ocrad mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-ocrad