[jira] [Comment Edited] (PDFBOX-4952) PDF compression - object stream creation

Michael Klink (Jira) Tue, 17 Aug 2021 03:19:06 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400306#comment-17400306
 ]


Michael Klink edited comment on PDFBOX-4952 at 8/17/21, 10:18 AM:
------------------------------------------------------------------

[~capSVD],
{quote}According to PDF 32000-1 a signature dictionary is not listed as one of 
the structures, that must never be compressed, it should be compressible and 
could be included in an object stream.
{quote}
A signature dictionary by design cannot be compressed.
 On one hand the signed byte ranges have to contain the byte offsets of the 
start and end of the signature content value; in a compressed stream such 
starts and ends may not have exact byte offsets. And on the other hand the byte 
ranges and signature value have to be patched into the file after the rest has 
been written. But changing a part of a compressed stream would usually imply 
changes to the rest of the document due to different compression properties of 
the original placeholder and the patched-in actual values.

Thus,
{quote}Is it acceptable for signature dictionaries to remain 
uncompressed?{quote}
it is not only acceptable, it actually is mandatory.


was (Author: mkl):
[~capSVD],
{quote}According to PDF 32000-1 a signature dictionary is not listed as one of 
the structures, that must never be compressed, it should be compressible and 
could be included in an object stream.{quote}
A signature dictionary by design cannot be compressed.
On one hand the signed byte ranges have to contain the byte offsets of the 
start and end of the signature content value; in a compressed stream such 
starts and ends may not have exact byte offsets. And on the other hand the byte 
ranges and signature value have to be patched into the file after the rest has 
been written. But changing a part of a compressed stream would usually imply 
changes to the rest of the document due to different compression properties of 
the original placeholder and the patched-in actual values.

> PDF compression - object stream creation
> ----------------------------------------
>
>                 Key: PDFBOX-4952
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4952
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel
>    Affects Versions: 2.0.21
>            Reporter: Christian Appl
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: 102_Spot_to_CMYK_X1a.pdf, 
> 102_Spot_to_CMYK_X1a_unc_BAD-3.0.0.pdf, 
> 102_Spot_to_CMYK_X1a_unc_GOOD-2.0.22.pdf, image-2020-09-07-09-47-30-172.png, 
> image-2020-09-07-10-05-15-631.png, image-2021-08-17-10-07-33-682.png, 
> image-2021-08-17-10-10-21-418.png, image-2021-08-17-10-21-00-352.png, 
> image-2021-08-17-10-24-44-999.png, image-2021-08-17-10-56-48-431.png
>
>
> I implemented a basic starting point to realize a PDF compression based on 
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a 
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that 
> surely must and could be extended further and it does only implement some 
> very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates 
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams -and 
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract 
> class for "ContentCompressor"s which search a document for specific content, 
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
>  -_ImageCompressor_-
>  -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
>  -Searches the document for yet unencoded streams and applies a Flate 
> compression where necessary.-
> -Both compressors can be parameterized using a centralized 
> "CompressParameters" instance which is passed to a new "saveCompressed" 
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions 
> for the "COSWriter" class. Basically it organizes objects, that are passed to 
> the COSWriter in objectStreams -and applies content optimization where 
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of 
> the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply 
> merge this fork into 2.0.22. I am expecting that you would like to have some 
> details and concepts changed and am ready to implement changes that would be 
> required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when 
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-4952) PDF compression - object stream creation

Reply via email to