[
https://issues.apache.org/jira/browse/PDFBOX-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843728#comment-16843728
]
Jonathan commented on PDFBOX-4541:
----------------------------------
About {{parseDirObject()}}, we currently do exactly that. We use a parser
subclass anyway to create our referenced streams, so I've just overridden that
method there. But it definitely do have impact later when we write. As far as I
understand, objects are set to be direct very selectively in the moment and
unless an object is set to be explicitly direct, {{COSWriter}} will append them
to a writing queue, which is not really compatible with our linearisation.
We currently don't even use PDDocument, but operate on COSDocument directly,
hence it's not a big deal for us to already use our own writer, in fact we
already do. I'd just like to avoir needing to override the entire
{{visitFromDictionary()}} and {{visitFromArray()}}, when the only thing we need
to change is one single if-statement.
But I do think a BaseWriter sounds like a good idea. I've just looked through
the class, maybe it would make sense to separate out the incremental updates?
Then object queue management, {{visitFromDocument()}}, {{visitFromArray()} and
{{visitFromDictionary}} would become more simple and could perhaps be
implemented in a subclass.
> Incorrect? handling of direct/indirect objects
> ----------------------------------------------
>
> Key: PDFBOX-4541
> URL: https://issues.apache.org/jira/browse/PDFBOX-4541
> Project: PDFBox
> Issue Type: Wish
> Components: Parsing, Writing
> Affects Versions: 2.0.14
> Reporter: Jonathan
> Priority: Major
> Attachments: broken_censored.pdf, linearized.pdf,
> linearized_withfix.pdf
>
>
> We ran into some issues concerning blank pages in some of our resulting PDF
> documents. Investigation showed that some objects which were referenced were
> never actually written. We then noticed that these objects were never written
> because they missed the `isDirect` flag. We were able to mitigate this issue
> by adding
> {code:java}
> if (retval != null) {
> retval.setDirect(true);
> }
> return retval;
> {code}
> at the end of `BaseParser.parseDirObject()`.
> While the pdfs were now displayed correctly, QPDFs check reported erroneous
> hint tables. The offsets there were calculated incorrectly because the
> objects were now written not only once, but, in fact, several times in places
> where they should have been merely referenced. We eventually resolved this
> issue by replacing the if-condiction
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue instanceof COSDictionary ||
> subValue == null)
> {code}
> in `COSWriter.visitFromArray(COSArray)` and
> `COSWriter.visitFromDictionay(COSDictionary)` with
> {code:java}
> if (willEncrypt || incrementalUpdate || subValue == null || !(subValue
> instanceof COSObject))
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]