Dave Hill created PDFBOX-4007:
---------------------------------

             Summary: Merged documents don't retain tags
                 Key: PDFBOX-4007
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4007
             Project: PDFBox
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 2.0.8
            Reporter: Dave Hill
            Priority: Minor
         Attachments: Tagged.pdf, pdfbox.patch

Certain combinations of documents don't retain tags when merged. The document 
[^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If 
you try to merge this with the government [General Forbearance 
form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library&shortName=general&localeCode=en-us]
 the output crashes DC when you try to view the tags. If you use a flattened 
version of the General Forbearance form then the tags are just munged.

{code}
    public static void main(String[] args) throws Exception {
        PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
        PDDocument src = PDDocument.load(new File("Tagged.pdf"));
        PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
        pdfMergerUtility.appendDocument(dest, src);
        src.close();
        dest.save(new File("BrokenTags.pdf"));
        dest.close();
    }
{code}

The included patch appears to make tagging more reliable, but I'm still relying 
heavily on cloning which can apparently cause other issues.  The documents I 
get out with this code seem present correctly in Adobe readers for all 
combinations of documents that I tested against.

My patch is made and tested against yesterdays production head and it includes 
my changes from [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] 
since it is in the exact same place in the code.

The priority of this is a blocker for 508 compliance of merged documents but I 
guessed it to be more of a minor issue in the overall scheme of things, please 
correct me if I am mistaken.

Thanks!




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to