Tilman Hausherr created PDFBOX-6009: ---------------------------------------
Summary: Splitter does not include structure tree in documents past the first split Key: PDFBOX-6009 URL: https://issues.apache.org/jira/browse/PDFBOX-6009 Project: PDFBox Issue Type: Bug Components: Utilities Reporter: Tilman Hausherr Attachments: pdfbox-split-missing-tags_mail 15.5.2025-p1.pdf, pdfbox-split-missing-tags_mail 15.5.2025-p2.pdf, pdfbox-split-missing-tags_mail 15.5.2025-p3.pdf, pdfbox-split-missing-tags_mail 15.5.2025.pdf As submitted by Alastair Porter in the users mailing list java -jar pdfbox/app/target/pdfbox-app-4.0.0-SNAPSHOT.jar split -i input.pdf -outputPrefix output-split Only first page has the appropriate structure tree (/K is missing) === from the post in the mailing list === In the first file, I correctly see the /K element. What's more, this element has correctly been pruned and doesn't include any items from the input document which point to pages that are not in this split. In subsequent split files, I see no /K element in the StructTreeRoot at all. I attached a PDF which I've been using for simple testing, which exhibits this behaviour. I had a bit of a look through the existing code, and I see that in Splitter.java, in cloneStructureTree {code:java} COSBase k1 = srcStructureTreeRoot.getK(); COSBase k2 = new KCloner(dstPageTree).createClone(k1, dstStructureTreeRoot.getCOSObject(), null); dstStructureTreeRoot.setK(k2); {code} k2 is always null after the first split, it seems like it may not be created correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org