[ 
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239981#comment-17239981
 ] 

Tilman Hausherr commented on PDFBOX-4952:
-----------------------------------------

I get this with with the file 
PDFBOX-4777-gs-bugzilla695040-hang-090214-050329-164.pdf when saving it 
(numbers may not be the actual ones from the trunk)
{code}
java.lang.NullPointerException
        at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectStreamObject(COSParser.java:781)
        at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:633)
        at 
org.apache.pdfbox.pdfparser.COSParser.dereferenceCOSObject(COSParser.java:582)
        at org.apache.pdfbox.cos.COSObject.getObject(COSObject.java:112)
        at 
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1229)
        at org.apache.pdfbox.cos.COSDictionary.accept(COSDictionary.java:1387)
        at 
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:674)
        at 
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:708)
        at 
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:604)
        at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:507)
        at 
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1305)
        at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:407)
        at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1578)
        at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1466)
        at 
org.apache.pdfbox.pdmodel.PDDocument.saveCompressed(PDDocument.java:959)
        at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:916)
        at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:903)
{code}


> PDF compression - object stream creation
> ----------------------------------------
>
>                 Key: PDFBOX-4952
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4952
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel
>    Affects Versions: 2.0.21
>            Reporter: Christian Appl
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: image-2020-09-07-09-47-30-172.png, 
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on 
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a 
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that 
> surely must and could be extended further and it does only implement some 
> very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates 
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams -and 
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract 
> class for "ContentCompressor"s which search a document for specific content, 
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
>  -_ImageCompressor_-
>  -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
>  -Searches the document for yet unencoded streams and applies a Flate 
> compression where necessary.-
> -Both compressors can be parameterized using a centralized 
> "CompressParameters" instance which is passed to a new "saveCompressed" 
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions 
> for the "COSWriter" class. Basically it organizes objects, that are passed to 
> the COSWriter in objectStreams -and applies content optimization where 
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of 
> the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply 
> merge this fork into 2.0.22. I am expecting that you would like to have some 
> details and concepts changed and am ready to implement changes that would be 
> required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when 
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to