On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline <kl...@thought.org> wrote: > Still, > before I get back to the Last few pages of my thesis, maybe I'll > try feeding parts of my most vanilla image-PDF file to an > opensource OCR program. I'm pretty sure there are a couple in > ports. IIRC, though, the images have to be jpegs of tiffs or the > like. If anybody knows, please give me a shout out!
The best idea is to use a format that does not have artifacts due to image compression through DCT or similar algorithms, read: "real black-white pictures" (1 bit color). JPEG is not such a format, you can see this by magnifying the surrounding of text: it is gray and looks "dusty". TIFF, GIF and PNG surely are better formats for feeding images into an OCR processor. (Background: Long time ago, I knew a man who did electronics and printed circuit boards. In order to save hard disk space, he converted his 1-bit BMP images of the schematics and the PCB layout to JPEG format - instead of just zipping, raring or arjing them. He was very unhappy to see them coming out of the printer "so dirty, partially unreadable" then allthough it was a high quality office class laser printer. And when he took the PCBs out of the acid bath, their previously photochemical treated surface looked strange, had holes in the copper, ready to be thrown away. This man was very upset when he was told about DCT and artifacts. Later on, he used GIF images and turned happy again.) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... _______________________________________________ firstname.lastname@example.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"