[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539054#comment-16539054
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
------------------------------------------

{quote}Most of the time people complain about time. There are almost never 
complaints about size.
{quote}
They will complain about size when they start to write high resolution prepress 
PDFs... There is a difference between a 60 MB PDF and a 80 MB PDF, especially 
if you work with many PDFs and have to upload them to the print shop. If you 
only care about web with low resolution images then of course the size is not 
that important. As always, it depends on the use case.
{quote}It is possible in PDF specification, but it hasn't been implemented for 
PDFBox.
{quote}
Do you have a pointer (e.g. pagenumber in the PDF 1.7 spec) where the index 
image format in PDF is described? I did not find this.
{quote}Try {{PDImageXObject.createFromByteArray()}}. 
{quote}
 Ah, I see. So their is already an API for this, it just does not really handle 
PNGs yet (i.e. it will load the PNG as BufferedImage and do a lossless 
compression in opposite to directly reusing the already optimized IDAT chunk - 
which of course would also be much faster). I'll try to find some time to 
implement IDAT reusing.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-4184
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4184
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.9
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>             Fix For: 2.0.12, 3.0.0 PDFBox
>
>         Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to