> On Aug 27, 2021, at 4:50 PM, Antonio Carlini via cctalk 
> <cctalk@classiccmp.org> wrote:
> 
> I have a few manuals to scan and I'm looking for suggestions, about how to 
> add bookmarks and how to handle colour.
> ...
> For photographs or shaded areas that don't necessarily come out well under 
> those settings, I plan to use 8-bit greyscale. I'd prefer to use 600dpi but I 
> may have to fall back to 300dpi if the per-page fiile size shoots up too much.

Depending on the resolution used, given that the photos are printed as halftone 
(black dots of various sizes), you may get weird scan artifacts.  Some scan 
programs may have tools to convert a halftone image to the equivalent grayscale 
image, such a thing is likely to be helpful.

> The real issue is colour. I know that various people have looked at the issue 
> of how to efficiently scan pages that are mostly black and white but have 
> some coloured text (RSX-11 manuals and early VMS manuals did this to 
> highlight terminal input, for example). I don't think this is a solved 
> problem and I'm not expecting a solution, what I'm really looking for is to 
> check that what I'm about to produce will have all the information that a 
> future efficient algorithm is likely to need.
> 
> I'm going to start by scanning the whole manual as though it had no colour 
> (so 600 dpi bilevel G4 encoded, except for pages with photos and shading and 
> so on). Then I'm going to go back and rescan the pages that have colour and 
> scan those at 600 dpi and save as a JPG.

JPG is the wrong tool for pages with color text or color line art.  As I've 
mentioned before, JPG is fit ONLY for photos, not for any image with hard 
edges.  Text compressed with JPG will suffer badly.

For material such as the RSX manuals you mentioned, the tool needed is a 
compression algorithm that handles color with hard edges faithfully.  Basically 
that means a lossless compression scheme.  That should be fine, since pages 
like that should compress very well, at least if the scan has been touched up 
just a bit to make the page background reasonably pure white.  With more effort 
it would be possible to reconstruct the original three-color material (white, 
black, red or whatever), but that's a fair amound harder and probably not 
necessary for adequate compression.  But please, make it a practice to avoid 
JPG except in those cases (rare or non-existent in document scanning work) 
where you're actually dealing with a continuous tone photograph).

        paul

Reply via email to