Hi Marcel,
On Jan 2, 2014, at 12:33, Marcel Martin <[email protected]> wrote:
> thanks for hocr2pdf tool, it's a really useful and I use it a lot. I have now
> discovered that hocr2pdf seems to convert PNG images to JPEG before embedding
> them and I would like to suggest that it does not do so.
>
> I reduce the file size of scanned documents (TIFFs) with:
>
> convert document.tiff -level 25% -colors 64 document.png
>
> Since the documents are mostly black and white, such a files is a lot smaller
> than the corresponding JPEG and the compression is even lossless.
>
> After OCR'ing, when I use hocr2pdf to create a PDF, the image is converted to
> JPEG, however, and the file becomes larger. This can be seen by running
> pdfimages on the PDF. PDFs do support images in a similar format to PNG so it
> would save me a lot of disk space if that were used instead of the conversion
> to JPEG.
>
> My command lines are:
> tesseract document.tiff document -l deu hocr
> hocr2pdf -i document.png -o document.pdf < document.html
> pdfimages -list document.pdf
thank you for your email. I went thru old emails and quickly added --compress
and --quality to the hocr2pdf fronted. I hope --compress flate with your file
produces reasonable results. If not let me know with an example images and I
take a look.
René
--
ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com |
http://t2-project.org | http://rene.rebe.de
-----------------------------------------------------------
If you wish to unsubscribe from this mailing, send mail to
[email protected] with a subject of: unsubscribe exact-image