[ 
https://issues.apache.org/jira/browse/PDFBOX-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746109#comment-17746109
 ] 

Andreas Lehmkühler commented on PDFBOX-5518:
--------------------------------------------

To mark the threads array as indirect was the easier part. I had to do some 
adjustments/fixes to add support for indirect COSArrays to the COSWriter. I'm 
hesitant to backport that to the 2.0.x version as there are to many 
differences. Some of the changes to the COSParser were essential and some of 
the old code looks at least suboptimal when it comes to direct/indirect 
objects. I don't want to put to much effort in this issue and risk any 
regressions.

> "Threads" array in Document Catalog should be an indirect reference
> -------------------------------------------------------------------
>
>                 Key: PDFBOX-5518
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5518
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.26
>            Reporter: Christian Appl
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: image-2022-09-23-09-50-30-766.png, 
> image-2022-09-23-10-03-15-070.png, image-2022-09-23-10-54-31-618.png, 
> image-2023-07-17-10-57-12-609.png, threads-out.pdf
>
>
> *TL;DR:*
> When using either of the methods "getThreads" or "setThreads" in class 
> PDDocumentCatalog and saving the resulting document: Adobe Preflight is 
> reporting an issue with the resulting "Threads" array in the document catalog 
> and claims it should have been an indirect object reference instead of a 
> direct object.
> My claim: The COSWriter should be able to create indirect objects for 
> COSArrays when required.
> *Checking PDF-32000-1:*
> In table 28 "Entries in the catalog dictionary" we can find the following 
> definition:
> !image-2022-09-23-09-50-30-766.png!
> *Determining reasons:*
> 1. The mentioned get and set methods create a COSArray for the entry 
> "Threads" of the catalog dictionary
> 2. The COSWriter is assuming, that COSArrays should always preferably be 
> written as a direct substructure of a dictionary.
> This may be entirely true for other arrays, but in this case is is cause for 
> a syntactical error in resulting documents. (It is plausible and possible - 
> but has not been checked - whether this causes issues for other structures 
> aswell.)
> The COSWriter provides the means to create indirect objects for 
> COSDictionaries, it however does (as far as I can see) not provide the means 
> to flag a COSArray for the same handling.
> *Possible solutions:*
> As far as I can see the COSWriter would be entirely capable of creating 
> COSObjects for any of the COSBase types, the only thing missing is the 
> ability to mark a COSArray to be written indirectly and the matching handling 
> by the COSWriter.
> Adding something like:
> !image-2022-09-23-10-03-15-070.png!
> at the right places in the COSWriter (similar to the handling of indirect 
> COSDictionaries) seems to do the trick and resolves the issue.
> *Important issue?:*
> I fixed this on our end and hence it is not a pressing issue, also "Threads" 
> is not as important and common as other structures and hence most documents 
> and users won´t encounter this issue at all.
> However - It would be nice, should this be fixed.
> *Concerning a possible patch:*
> I could provide a patch making the required changes, but would have to adapt 
> it for the current PDFBox 2.0.27-SNAPSHOT as I developed it rather as a 
> hotfix for our mirror of the library.
> And concerning that patch I should mention:
> As can be assumed - a "isDirectArray" and "setDirectArray" method have been 
> added to the COSArray - which is a quick and dirty solution, as it would be 
> preferable for COSArray to use the already existing "direct" field, that 
> other COSBase types (COSDictionaries) already use.
> As stated - the solution is quick and dirty and for a final solution in the 
> PDFBox library a cleaner approach would be preferable. Hence I did not 
> provide that patch for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to