[
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191989#comment-17191989
]
Christian Appl edited comment on PDFBOX-4952 at 9/8/20, 7:26 AM:
-----------------------------------------------------------------
*Formatting:*
Aah sorry, IntelliJ Idea is doing such things automatically. Did not think of
those optimizations, will turn that off!
*Concerning AbstractCOSWriter:*
{color:#172b4d}+TLDR+:{color} Having had a little time to think about it, I
take back the suggestion about introducing a common superclass (I will _not_
commit that). As I see it, this will be causing more issues, than it solves.
I agree to your statement, that this can not easily be done for now. Will shelf
that and will think about that again, if such things should be required to
solve this ticket.
+Reasons:+ Even when exaggerated and driven to the last corners of the
COSWriter, this will result in code duplication.
One can - for example - introduce "visitX" methods for the different COSTypes
in the abstract class and that will work fine, as handling most of them will
not differ for most extending types.
But even that can not be applied to visitCOSArray, visitCOSDictionary and
visitDocument, which have to be implemented specialized per subtype and will
partially contain code, other COSWriters will also require.
The word partially also is the problematic word, when talking about other
methods like writeHead, writeBody and writeTail. Partially such methods will be
identical, but not in a way, that could be easily provided by the superclass
implementation of a method.
If one does not want to segment the code to millions of tiny methods serving
some minimal purpose, this is nearly impossible, if one wants to avoid to
rewrite and restructure the most central methods of COSWriter.
Something which I decided not risking to do.
With each of those modifications the risk became higher and higher, that I
break the whole thing - therefore said last commit will not be shared.
was (Author: capsvd):
Aah, IntelliJ Idea is doing such things automatically. Sorry, did not think of
those optimizations, will have to turn that off.
> PDF compression - object stream creation
> ----------------------------------------
>
> Key: PDFBOX-4952
> URL: https://issues.apache.org/jira/browse/PDFBOX-4952
> Project: PDFBox
> Issue Type: New Feature
> Components: PDModel
> Affects Versions: 2.0.21
> Reporter: Christian Appl
> Priority: Major
> Attachments: image-2020-09-07-09-47-30-172.png,
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that
> surely must and could be extended further and it does only implement some
> very basic and simplistic Unit Tests.
> However it is able to reduce the size of resulting documents, and creates
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
> It provides the bundling and compression of objects to objectstreams -and
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract
> class for "ContentCompressor"s which search a document for specific content,
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
> -_ImageCompressor_-
> -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
> -Searches the document for yet unencoded streams and applies a Flate
> compression where necessary.-
> -Both compressors can be parameterized using a centralized
> "CompressParameters" instance which is passed to a new "saveCompressed"
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions
> for the "COSWriter" class. Basically it organizes objects, that are passed to
> the COSWriter in objectStreams -and applies content optimization where
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of
> the compressed documents.
> *Caveat:*
> If this feature is interesting to you, then I would not expect you to simply
> merge this fork into 2.0.22. I am expecting that you would like to have some
> details and concepts changed and am ready to implement changes that would be
> required for this to work to your liking.
> *POC:*
> 4 resulting documents can be found in "target/test-output/compression" when
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
> [https://github.com/apache/pdfbox/pull/86]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]