On March 1, 2021 12:50:35 PM GMT+01:00, Wols Lists <antli...@youngman.org.uk> wrote: >I've got a bunch of scans, let's assume they're text documents. And >they're rather big ... I want to email them. > >How on earth do I convert them to TRUE b&w documents? At the moment they >are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes >to store all the colour, luminance, whatever, per pixel. But actually, >there's only ONE BIT of information there - whether that pixel is black >or white. > >I'm using imagemagick, but so far all my attempts to strip out the >surplus information have resulted in INcreasing the file size ??? > >So basically, how do I save an image as "one bit per pixel" like you'd >think you'd send to a B&W printer? > >Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of >uncompressed info for a page of A4, not 3MB. > >Cheers, >Wol >
Have you tried an optical character recognition software like Tesseract[1]? 1. https://github.com/tesseract-ocr/tesseract -- Hund