Christian Appl created PDFBOX-5518:
--------------------------------------

             Summary: "Threads" array in Document Catalog should be an indirect 
reference
                 Key: PDFBOX-5518
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5518
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 2.0.26
            Reporter: Christian Appl
         Attachments: image-2022-09-23-09-50-30-766.png, 
image-2022-09-23-10-03-15-070.png

*TL;DR:*
When using either of the methods "getThreads" or "setThreads" in class 
PDDocumentCatalog and saving the resulting document: Adobe Preflight is 
reporting an issue with the resulting "Threads" array in the document catalog 
and claims it should have been an indirect object reference instead of a direct 
object.

My claim: The COSWriter should be able to create indirect objects for COSArrays 
when required.

*Checking PDF-32000-1:*
In table 28 "Entries in the catalog dictionary" we can find the following 
definition:
!image-2022-09-23-09-50-30-766.png!


*Determining reasons:*
1. The mentioned get and set methods create a COSArray for the entry "Threads" 
of the catalog dictionary
2. The COSWriter is assuming, that COSArrays should always preferably be 
written as a direct substructure of a dictionary.

This may be entirely true for other arrays, but in this case is is cause for a 
syntactical error in resulting documents. (It is plausible and possible - but 
has not been checked - whether this causes issues for other structures aswell.)

The COSWriter provides the means to create indirect objects for 
COSDictionaries, it however does (as far as I can see) not provide the means to 
flag a COSArray for the same handling.

*Possible solutions:*
As far as I can see the COSWriter would be entirely capable of creating 
COSObjects for any of the COSBase types, the only thing missing is the ability 
to mark a COSArray to be written indirectly and the matching handling by the 
COSWriter.
Adding something like:
!image-2022-09-23-10-03-15-070.png!
at the right places in the COSWriter (similar to the handling of indirect 
COSDictionaries) seems to do the trick and resolves the issue.

*Important issue?:*
I fixed this on our end and hence it is not a pressing issue, also "Threads" is 
not as important and common as other structures and hence most documents and 
users won´t encounter this issue at all.

However - It would be nice, should this be fixed.

*Concerning a possible patch:*
I could provide a patch making the required changes, but would have to adapt it 
for the current PDFBox 2.0.27-SNAPSHOT as I developed it rather as a hotfix for 
our mirror of the library.

And concerning that patch I should mention:
As can be assumed - a "isDirectArray" and "setDirectArray" method have been 
added to the COSArray - which is a quick and dirty solution, as it would be 
preferable for COSArray to use the already existing "direct" field, that other 
COSBase types (COSDictionaries) already use.

As stated - the solution is quick and dirty and for a final solution in the 
PDFBox library a cleaner approach would be preferable. Hence I did not provide 
that patch for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to