[ 
https://issues.apache.org/jira/browse/PDFBOX-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843526#comment-16843526
 ] 

Tilman Hausherr commented on PDFBOX-4541:
-----------------------------------------

COSWriter is definitively too complex (but I don't have a better concept), and 
your linearization preparation code would probably have to "predict" all the 
optimizations done there. And any changes in COSWriter would probably break 
your application. Maybe dependency injection would be a solution? I.e. that 
PDDocument has a default COSWriter, but that users could also pass their own 
before calling save(). Maybe have a base COSWriter that provides some very 
simple classes only. If we can't come up with any, then just an interface.

I'm doubting that any changes made in parseDirObject() are the solution for 
anything that happens much later. While you're just thinking about 
linearization, what about code that loads a PDF, makes some changes, and then 
saves? (Besides, you could just extend parseDirObject yourself in an own 
parser, like done in preflight)

> Incorrect? handling of direct/indirect objects
> ----------------------------------------------
>
>                 Key: PDFBOX-4541
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4541
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing, Writing
>    Affects Versions: 2.0.14
>            Reporter: Jonathan
>            Priority: Major
>         Attachments: broken_censored.pdf, linearized.pdf, 
> linearized_withfix.pdf
>
>
> We ran into some issues concerning blank pages in some of our resulting PDF 
> documents. Investigation showed that some objects which were referenced were 
> never actually written. We then noticed that these objects were never written 
> because they missed the `isDirect` flag. We were able to mitigate this issue 
> by adding
> {code:java}
> if (retval != null) {
>     retval.setDirect(true);
> }
> return retval;
> {code}
> at the end of `BaseParser.parseDirObject()`.
> While the pdfs were now displayed correctly, QPDFs check reported erroneous 
> hint tables. The offsets there were calculated incorrectly because the 
> objects were now written not only once, but, in fact, several times in places 
> where they should have been merely referenced. We eventually resolved this 
> issue by replacing the if-condiction
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue instanceof COSDictionary || 
> subValue == null)
> {code}
> in `COSWriter.visitFromArray(COSArray)` and 
> `COSWriter.visitFromDictionay(COSDictionary)` with
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue == null || !(subValue 
> instanceof COSObject))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to