On 7/2/08, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:

>I'd like to use PIL to prep an image file to improve OCR quality.
>
>Specifically, I need to filter out all but black pixels from the image (i.e., 
>convert all non-black pixels to white while retaining the black pixels).
>
>Can someone please direct me to the appropriate PIL function/method to 
>accomplish this along with a brief description of the correct arguments to use?

I don't  have the arguments to use, but the process is a bit more involved to 
enhance a bi-level image obtained through grayscale in order to get the best 
results (IMO).

The best results I have seen are by applying a moderately strong 'S' curve with 
sharp shoulders, then applying two passes of unsharp masking, one with a large 
aperture and a subsequent with a lower-intensity and smaller aperture, then 
finally maping the to bit-level required by OCR (usually a threshold into a 
bitmap).

Another trick, if you have the time is to scan at a higher resolution (in 
integer increments i.e. 2x, 3x, 4x so interpolation doesn't interfere), process 
the image as described then reduce the resolution to the optimum OCR res. I 
have to admit, this is from a while ago, I'm not sure what the current state of 
affairs is with OCR software (been 10 years, if a day, since I used any).

Scott
_______________________________________________
Image-SIG maillist  -  Image-SIG@python.org
http://mail.python.org/mailman/listinfo/image-sig

Reply via email to