Hi Marcel,

On Jan 2, 2014, at 12:33, Marcel Martin <[email protected]> wrote:

> thanks for hocr2pdf tool, it's a really useful and I use it a lot. I have now 
> discovered that hocr2pdf seems to convert PNG images to JPEG before embedding 
> them and I would like to suggest that it does not do so.
> 
> I reduce the file size of scanned documents (TIFFs) with:
> 
> convert document.tiff -level 25% -colors 64 document.png
> 
> Since the documents are mostly black and white, such a files is a lot smaller 
> than the corresponding JPEG and the compression is even lossless.
> 
> After OCR'ing, when I use hocr2pdf to create a PDF, the image is converted to 
> JPEG, however, and the file becomes larger. This can be seen by running 
> pdfimages on the PDF. PDFs do support images in a similar format to PNG so it 
> would save me a lot of disk space if that were used instead of the conversion 
> to JPEG.
> 
> My command lines are:
> tesseract document.tiff document -l deu hocr
> hocr2pdf -i document.png -o document.pdf < document.html
> pdfimages -list document.pdf


thank you for your email. I went thru old emails and quickly added --compress 
and --quality to the hocr2pdf fronted. I hope --compress flate with your file 
produces reasonable results. If not let me know with an example images and I 
take a look.

        René

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | 
http://t2-project.org | http://rene.rebe.de

----------------------------------------------------------- 
If you wish to unsubscribe from this mailing, send mail to
[email protected] with a subject of: unsubscribe exact-image

Reply via email to