Tilman Hausherr created PDFBOX-6009:
---------------------------------------
Summary: Splitter does not include structure tree in documents
past the first split
Key: PDFBOX-6009
URL: https://issues.apache.org/jira/browse/PDFBOX-6009
Project: PDFBox
Issue Type: Bug
Components: Utilities
Reporter: Tilman Hausherr
Attachments: pdfbox-split-missing-tags_mail 15.5.2025-p1.pdf,
pdfbox-split-missing-tags_mail 15.5.2025-p2.pdf, pdfbox-split-missing-tags_mail
15.5.2025-p3.pdf, pdfbox-split-missing-tags_mail 15.5.2025.pdf
As submitted by Alastair Porter in the users mailing list
java -jar pdfbox/app/target/pdfbox-app-4.0.0-SNAPSHOT.jar split -i input.pdf
-outputPrefix output-split
Only first page has the appropriate structure tree (/K is missing)
=== from the post in the mailing list ===
In the first file, I correctly see the /K element. What's more, this element
has correctly been pruned and doesn't include any items from the input document
which point to pages that are not in this split.
In subsequent split files, I see no /K element in the StructTreeRoot at all.
I attached a PDF which I've been using for simple testing, which exhibits this
behaviour.
I had a bit of a look through the existing code, and I see that in
Splitter.java, in cloneStructureTree
{code:java}
COSBase k1 = srcStructureTreeRoot.getK();
COSBase k2 = new KCloner(dstPageTree).createClone(k1,
dstStructureTreeRoot.getCOSObject(), null);
dstStructureTreeRoot.setK(k2);
{code}
k2 is always null after the first split, it seems like it may not be created
correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]