[
https://issues.apache.org/jira/browse/PDFBOX-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837445#comment-16837445
]
Tilman Hausherr edited comment on PDFBOX-4541 at 5/11/19 4:32 AM:
------------------------------------------------------------------
The changes in branch 2-45 in PDFBOX-45 are rather simple, there are only in
COSWriter and PDDocument and related only to people who want to make
incremental savings. But I experienced some troubles in 2017 and in 2019 while
working on them, i.e. double objects / orphan objects (these changes were
reverted or never made it into the code). This happens if an object is saved
both directly and indirectly, which happens when the object is added to a
certain table in COSWriter.
Your code change suggestion would remove an optimization that is part of our
code, i.e. that only dictionaries are written as indirect objects, but not
other objects, even if they are indirect. (And btw there's another such
optimization: resources and xobject dictionaries are always written as direct).
I thought per PDFBOX-4540 that you're making all objects indirect?
I tried running the change in {{parseDirObject()}} alone, it seemed to make
sense (keep direct objects as such), but build tests failed (many in
{{PDFMergerUtilityTest}}). I thought that "Dir" is from "direct" but it isn't -
it is also called by {{parseFileObject()}}, i.e. it parses the object that
comes after "1234 0 obj".
was (Author: tilman):
The changes in branch 2-45 in PDFBOX-45 are rather simple, there are only in
COSWriter and PDDocument and related only to people who want to make
incremental savings. But I experienced some troubles in 2017 and in 2019 while
working on them, i.e. double objects / orphan objects (these changes were
reverted or never made it into the code). This happens if an object is saved
both directly and indirectly, which happens when the object is added to a
certain table in COSWriter.
Your code change suggestion would remove an optimization that is part of our
code, i.e. that only dictionaries are written as indirect objects, but not
other objects, even if they are indirect. (And btw there's another such
optimization: resources and xobject dictionaries are always written as direct.
I thought per PDFBOX-4540 that you're making all objects indirect?
I tried running the change in {{parseDirObject()}} alone, it seemed to make
sense (keep direct objects as such), but build tests failed (many in
{{PDFMergerUtilityTest}}). I thought that "Dir" is from "direct" but it isn't -
it is also called by {{parseFileObject()}}, i.e. it parses the object that
comes after "1234 0 obj".
> Incorrect? handling of direct/indirect objects
> ----------------------------------------------
>
> Key: PDFBOX-4541
> URL: https://issues.apache.org/jira/browse/PDFBOX-4541
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, Writing
> Affects Versions: 2.0.14
> Reporter: Jonathan
> Priority: Major
> Attachments: broken_censored.pdf, linearized.pdf,
> linearized_withfix.pdf
>
>
> We ran into some issues concerning blank pages in some of our resulting PDF
> documents. Investigation showed that some objects which were referenced were
> never actually written. We then noticed that these objects were never written
> because they missed the `isDirect` flag. We were able to mitigate this issue
> by adding
> {code:java}
> if (retval != null) {
> retval.setDirect(true);
> }
> return retval;
> {code}
> at the end of `BaseParser.parseDirObject()`.
> While the pdfs were now displayed correctly, QPDFs check reported erroneous
> hint tables. The offsets there were calculated incorrectly because the
> objects were now written not only once, but, in fact, several times in places
> where they should have been merely referenced. We eventually resolved this
> issue by replacing the if-condiction
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue instanceof COSDictionary ||
> subValue == null)
> {code}
> in `COSWriter.visitFromArray(COSArray)` and
> `COSWriter.visitFromDictionay(COSDictionary)` with
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue == null || !(subValue
> instanceof COSObject))
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]