On Thu, 30 Jan 2003, Christopher Sawtell wrote: > On Thu, 30 Jan 2003 17:39, Rex Johnston wrote: > > > As for OCR, well last time i looked, it did actually suck. > > Anyone used clara ? > > In a word, hopeless. You have to train it for every font you are > going to use, and even then it only works more than half reliably if > the print quality is absolutely exceptional.
Yes, that is exactly what clara is supposed to be, and as such it works fine, provided you use it for the intended kind of use, i.e., scanning whole books, where you use the first two pages for training, and then you start production. I have tried this on a 102 year old book which is neither printed very sharply nor is the paper smooth and white. However, if you want to do something quickly, clara is not the thing you want to use. And you should not economize on scan resolution and file size if you want to use clara. With me not being an OCR expert, I found I had to read the manual to get all the things like joining broken characters and tuning the recognition right. But I found once you get it tuned properly it works all right. The other approach, was it called unifont ?, is used in gocr. No training is required, but if the font you have is far from what gocr knows, it does not work so well. But it's always worth a try, because it is easy to use and fast, and you see quickly whether it works with your scans. I once compared gocr results with the ones I got with the OCR program that came with a HP scanner (was that Omnipage?), and there was not much difference. There were some pages with Times and Helvetica in different sizes, all on one page, and neither gocr nor the commercial thing got everything right. I did not try clara on these documents, because clara is made for long documents with only one font. Of course, you can train clara to recognize more than one font in one page, but you have to do more training, and results will not necessarily get better. > To write an OCR app which works properly is, imho, beyond the means > of the usual kind of Linux project. The road is absolutely littered > with failed attempts. I looked into the subject fairly throughly a What is the "usual kind of Linux project"? With all the variety of open source software, ranging from three-line perl scripts, over medium-scale apps like GIMP, to absolutely professional quality simulation packages like Ptolemy (which is, IMHO, technically better than the commercial competitors who sell a single-user license at prices far in excess of several tens of thousands of $), there does not seem to be the "typical" thing. And then it is a similarly wide variety of experience and education levels the open source programmers have. The fact that software is freely available does not say a bit about its quality. Neither does the fact that you pay dearly for commercial software indicate the quality. I have seen commercial packages for between 200 k$ and 500 k$ per single-user license just not do what the offer / technical spec said. The bad thing with software is, that the matter can be too complex even for experts to know that when you buy after some testing, and in many cases license disclaimers would at most allow you to return the software and reclaim your money, but they will never cover the caused damage! > year or two ago because I have a friend who is rather severely > handicapped visually, and I wanted to make him a book reader. I > ended up reading the book aloud to him. Good on you! At least he got it read in much better quality than any machine could possibly have done. BTW, I just returned a 5-CD set of Douglas Adams' "The Restaurant at the End of the Universe", read by himself, to the public library. And one more point concerning e-texts and open software: before you scan a book, always check first at Project Gutenberg whether someone has already done the scanning & OCR. But that is certainly nothing new to you :-) Cheers, Helmut. +----------------+ | Helmut Walle | | [EMAIL PROTECTED] | +----------------+
