[jira] [Created] (PDFBOX-6009) Splitter does not include structure tree in documents past the first split

Tilman Hausherr (Jira) Thu, 15 May 2025 10:52:05 -0700

Tilman Hausherr created PDFBOX-6009:
---------------------------------------


             Summary: Splitter does not include structure tree in documents 
past the first split
                 Key: PDFBOX-6009
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6009
             Project: PDFBox
          Issue Type: Bug
          Components: Utilities
            Reporter: Tilman Hausherr
         Attachments: pdfbox-split-missing-tags_mail 15.5.2025-p1.pdf, 
pdfbox-split-missing-tags_mail 15.5.2025-p2.pdf, pdfbox-split-missing-tags_mail 
15.5.2025-p3.pdf, pdfbox-split-missing-tags_mail 15.5.2025.pdf

As submitted by Alastair Porter in the users mailing list

java -jar pdfbox/app/target/pdfbox-app-4.0.0-SNAPSHOT.jar split -i input.pdf 
-outputPrefix output-split

Only first page has the appropriate structure tree (/K is missing)

=== from the post in the mailing list ===
In the first file, I correctly see the /K element. What's more, this element 
has correctly been pruned and doesn't include any items from the input document 
which point to pages that are not in this split.
In subsequent split files, I see no /K element in the StructTreeRoot at all.

I attached a PDF which I've been using for simple testing, which exhibits this 
behaviour.

I had a bit of a look through the existing code, and I see that in 
Splitter.java, in cloneStructureTree
{code:java}
COSBase k1 = srcStructureTreeRoot.getK();
COSBase k2 = new KCloner(dstPageTree).createClone(k1, 
dstStructureTreeRoot.getCOSObject(), null);
dstStructureTreeRoot.setK(k2);
{code}
k2 is always null after the first split, it seems like it may not be created 
correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Created] (PDFBOX-6009) Splitter does not include structure tree in documents past the first split

Reply via email to