Hello,

thanks for hocr2pdf tool, it's a really useful and I use it a lot. I have now discovered that hocr2pdf seems to convert PNG images to JPEG before embedding them and I would like to suggest that it does not do so.

I reduce the file size of scanned documents (TIFFs) with:

convert document.tiff -level 25% -colors 64 document.png

Since the documents are mostly black and white, such a files is a lot smaller than the corresponding JPEG and the compression is even lossless.

After OCR'ing, when I use hocr2pdf to create a PDF, the image is converted to JPEG, however, and the file becomes larger. This can be seen by running pdfimages on the PDF. PDFs do support images in a similar format to PNG so it would save me a lot of disk space if that were used instead of the conversion to JPEG.

My command lines are:
tesseract document.tiff document -l deu hocr
hocr2pdf -i document.png -o document.pdf < document.html
pdfimages -list document.pdf

Regards,
Marcel



----------------------------------------------------------- If you wish to unsubscribe from this mailing, send mail to
[email protected] with a subject of: unsubscribe exact-image

Reply via email to