[Bug-ocrad] A few ocrad problems

Don Moir Sat, 01 Jun 2013 18:37:51 -0700

Hello,

I am a developer and just started using ocrad. I am using ocrad-0.22-rc2.


Results so far look good and better than other other sources I have tried.

Here's some problems I have found:

1) An orphan capitial letter I fails to be detected. The current code checks 
for a before or after AlphaNum but space is not taken into account.

So for example if you have: a<space>|<space>b The I is not detected as a 
capital letter I and left as a vertical bar. So when setting lcode and rcode in 
Textline::recognize2 when checing vertical bar, you need to skip before and 
after spaces to see what lcode and rcode need to be set to.

2) I have an example with the word UP in it. This is detected as uP (lower case 
u)

3) Failure to detect a space character in latin_space.pbm. The words como jamás 
are detected as comojamás, otherwise the recognition is perfect here.

4) Failure to detect merged ti, vi, im, ll,  in merged_ti_vi_im_ll.pbm

The attached zip contains 6 files:

cap_I_and_UP.pbm (for items 1 and 2)
cap_I_and_UP.txt

latin_space.pbm (for item 3)
latin_space.txt

merged_ti_vi_im_ll.pbm (for item 4)
merged_ti_vi_im_ll.txt

ocrad is working better for me than anything else so far so looks very 
promising.

I am wondering if possible merged characters should be added as special 
characters. like TT, ti, etc so then in future it's easy to add such 
combinations.

_______________________________________________
Bug-ocrad mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ocrad

[Bug-ocrad] A few ocrad problems

Reply via email to