Hello, I am a developer and just started using ocrad. I am using ocrad-0.22-rc2.
Results so far look good and better than other other sources I have tried. Here's some problems I have found: 1) An orphan capitial letter I fails to be detected. The current code checks for a before or after AlphaNum but space is not taken into account. So for example if you have: a<space>|<space>b The I is not detected as a capital letter I and left as a vertical bar. So when setting lcode and rcode in Textline::recognize2 when checing vertical bar, you need to skip before and after spaces to see what lcode and rcode need to be set to. 2) I have an example with the word UP in it. This is detected as uP (lower case u) 3) Failure to detect a space character in latin_space.pbm. The words como jamás are detected as comojamás, otherwise the recognition is perfect here. 4) Failure to detect merged ti, vi, im, ll, in merged_ti_vi_im_ll.pbm The attached zip contains 6 files: cap_I_and_UP.pbm (for items 1 and 2) cap_I_and_UP.txt latin_space.pbm (for item 3) latin_space.txt merged_ti_vi_im_ll.pbm (for item 4) merged_ti_vi_im_ll.txt ocrad is working better for me than anything else so far so looks very promising. I am wondering if possible merged characters should be added as special characters. like TT, ti, etc so then in future it's easy to add such combinations.
_______________________________________________ Bug-ocrad mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-ocrad
