Re: Converting Documents

Paul Koning via cctalk Thu, 09 Apr 2020 07:24:22 -0700

> On Apr 9, 2020, at 10:16 AM, emanuel stiebler via cctalk 
> <[email protected]> wrote:
> 
> Hi All,
> somebody scanned documents for me in .pdfs.
> Looking into them, they are pages of jpgs embedded in .pdf ..
> (100 pages resulting in 350MBytes ...)
> 
> Any easy way to convert them into some b/w .pdf file?
> It is all text, no drawings ...
> 
> Pointers?
> 
> Thanks

A good source of information is Al Kossow's Bitsavers archive, the section 
where he describes the tools he uses.

It's very unfortunate your original scan files are JPG; those are the wrong 
format for text or line art -- JPG is ONLY for photographs and similar 
continuous tone images.  TIFF or PNG or B/W FAX formats are all superior, and 
often more compact.

If by "convert to b/w" you mean to b/w images, Al's tools will help.  If you 
mean extracting the actual text, that's a different matter, now you need an OCR 
tool.  There are good commercial OCR programs around.  No open source ones that 
I know of; I've seen one but it didn't work well enough to be worth the 
trouble.  OCR may be extremely effective or not at all depending on the quality 
of the material.  In really extreme cases you may have to type things in by 
hand; I've done that with 600 pages of blurry listings because there was a good 
reason to go to that effort.

        paul
Re: Converting Documents

Reply via email to