[
https://issues.apache.org/jira/browse/PDFBOX-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838376#comment-16838376
]
Jonathan commented on PDFBOX-4541:
----------------------------------
I fired up the debugger again and I think I've found the underlying problem for
our issues. Basically, the issue is that we build a predefined tree of objects
we want to write. If now COSWriter tries to optimise this structure everything
gets screwed up. As we need to write two main bodies of the PDF (first page and
other pages) we can't use COSWriter.doWriteBody(COSDocument), which is the only
method that calls doWriteObjects() which is the only method processing objects
added in addObjectToWrite(COSBase). That's the reason we have our missing
objects.
The reason why we are getting the reference '1 0 R' instead is because we
predefine object numbers, but COSWriter's internal counter 'number' isn't
modified. When now COSWriter tries to perform an optimisation and attempts to
write the dictionary directly, a new object number beginning with 1 is
generated and the dictionary added to the queue which is never written.
Can you think of a smarter way to mitigate these problems?
> Incorrect? handling of direct/indirect objects
> ----------------------------------------------
>
> Key: PDFBOX-4541
> URL: https://issues.apache.org/jira/browse/PDFBOX-4541
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, Writing
> Affects Versions: 2.0.14
> Reporter: Jonathan
> Priority: Major
> Attachments: broken_censored.pdf, linearized.pdf,
> linearized_withfix.pdf
>
>
> We ran into some issues concerning blank pages in some of our resulting PDF
> documents. Investigation showed that some objects which were referenced were
> never actually written. We then noticed that these objects were never written
> because they missed the `isDirect` flag. We were able to mitigate this issue
> by adding
> {code:java}
> if (retval != null) {
> retval.setDirect(true);
> }
> return retval;
> {code}
> at the end of `BaseParser.parseDirObject()`.
> While the pdfs were now displayed correctly, QPDFs check reported erroneous
> hint tables. The offsets there were calculated incorrectly because the
> objects were now written not only once, but, in fact, several times in places
> where they should have been merely referenced. We eventually resolved this
> issue by replacing the if-condiction
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue instanceof COSDictionary ||
> subValue == null)
> {code}
> in `COSWriter.visitFromArray(COSArray)` and
> `COSWriter.visitFromDictionay(COSDictionary)` with
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue == null || !(subValue
> instanceof COSObject))
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]