[
https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319259#comment-16319259
]
Dave Hill commented on PDFBOX-4007:
-----------------------------------
I'm trying to find an example of the PDF which I commented "When we dig into
the output we still find orphaned pages" and I am not able to reproduce the PDF
that prompted that comment using the current 3.0-SNAPSHOT. When I made that
comment I recall I was using human readable PDFs and I was looking through the
output and saw pages were duplicated, but that the duplicates did not tie back
to the root object. This is what I meant by "more effectively orphaned". The
output I am now getting from the development head (with and without the tag I
proposed) is failing for a number of different reasons depending on the
combinations of test files that I try and the order I try them in. I see output
with no tags, with mangled tags, and even one case where the tagged page is
completely missing.
> Merged documents don't retain tags
> ----------------------------------
>
> Key: PDFBOX-4007
> URL: https://issues.apache.org/jira/browse/PDFBOX-4007
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.8
> Reporter: Dave Hill
> Priority: Minor
> Labels: StructureTree, merge
> Attachments: HelloWorldTagged.pdf, PDFMergeUtility-2.patch,
> PDFMergeUtility.patch, Tagged+GeneralForbearance-Merged.pdf, Tagged.pdf
>
>
> Certain combinations of documents don't retain tags when merged. The document
> [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If
> you try to merge this with the government [General Forbearance
> form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library&shortName=general&localeCode=en-us]
> the output crashes DC when you try to view the tags. If you use a flattened
> version of the General Forbearance form then the tags are just munged.
> {code}
> public static void main(String[] args) throws Exception {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File("Tagged.pdf"));
> PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
> pdfMergerUtility.appendDocument(dest, src);
> src.close();
> dest.save(new File("BrokenTags.pdf"));
> dest.close();
> }
> {code}
> The included patch appears to make tagging more reliable, but I'm still
> relying heavily on cloning which can apparently cause other issues. The
> documents I get out with this code seem present correctly in Adobe readers
> for all combinations of documents that I tested against.
> My patch is made and tested against yesterdays production head and it
> includes my changes from
> [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is
> in the exact same place in the code.
> The priority of this is a blocker for 508 compliance of merged documents but
> I guessed it to be more of a minor issue in the overall scheme of things,
> please correct me if I am mistaken.
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]