[ 
https://issues.apache.org/jira/browse/PDFBOX-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843728#comment-16843728
 ] 

Jonathan commented on PDFBOX-4541:
----------------------------------

About {{parseDirObject()}}, we currently do exactly that. We use a parser 
subclass anyway to create our referenced streams, so I've just overridden that 
method there. But it definitely do have impact later when we write. As far as I 
understand, objects are set to be direct very selectively in the moment and 
unless an object is set to be explicitly direct, {{COSWriter}} will append them 
to a writing queue, which is not really compatible with our linearisation.

We currently don't even use PDDocument, but operate on COSDocument directly, 
hence it's not a big deal for us to already use our own writer, in fact we 
already do. I'd just like to avoir needing to override the entire 
{{visitFromDictionary()}} and {{visitFromArray()}}, when the only thing we need 
to change is one single if-statement.

But I do think a BaseWriter sounds like a good idea. I've just looked through 
the class, maybe it would make sense to separate out the incremental updates? 
Then object queue management, {{visitFromDocument()}}, {{visitFromArray()} and 
{{visitFromDictionary}} would become more simple and could perhaps be 
implemented in a subclass.

> Incorrect? handling of direct/indirect objects
> ----------------------------------------------
>
>                 Key: PDFBOX-4541
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4541
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing, Writing
>    Affects Versions: 2.0.14
>            Reporter: Jonathan
>            Priority: Major
>         Attachments: broken_censored.pdf, linearized.pdf, 
> linearized_withfix.pdf
>
>
> We ran into some issues concerning blank pages in some of our resulting PDF 
> documents. Investigation showed that some objects which were referenced were 
> never actually written. We then noticed that these objects were never written 
> because they missed the `isDirect` flag. We were able to mitigate this issue 
> by adding
> {code:java}
> if (retval != null) {
>     retval.setDirect(true);
> }
> return retval;
> {code}
> at the end of `BaseParser.parseDirObject()`.
> While the pdfs were now displayed correctly, QPDFs check reported erroneous 
> hint tables. The offsets there were calculated incorrectly because the 
> objects were now written not only once, but, in fact, several times in places 
> where they should have been merely referenced. We eventually resolved this 
> issue by replacing the if-condiction
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue instanceof COSDictionary || 
> subValue == null)
> {code}
> in `COSWriter.visitFromArray(COSArray)` and 
> `COSWriter.visitFromDictionay(COSDictionary)` with
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue == null || !(subValue 
> instanceof COSObject))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to