[ 
https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269509#comment-16269509
 ] 

Dave Hill edited comment on PDFBOX-4007 at 11/28/17 9:41 PM:
-------------------------------------------------------------

This patch is another attempt at merging documents with mixed tags. This 
version removes the unwanted orphans introduced through cloning. Not the most 
optimal approach but it is working and green and produces properly structured 
output.


was (Author: davesplanet):
Another attempt at merging documents with mixed tags. This version removes the 
unwanted orphans introduced through cloning. Not the most optimal approach but 
it is working and green and produces properly structured output.

> Merged documents don't retain tags
> ----------------------------------
>
>                 Key: PDFBOX-4007
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4007
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.8
>            Reporter: Dave Hill
>            Priority: Minor
>              Labels: StructureTree, merge
>         Attachments: HelloWorldTagged.pdf, PDFMergeUtility-2.patch, 
> PDFMergeUtility.patch, Tagged+GeneralForbearance-Merged.pdf, Tagged.pdf
>
>
> Certain combinations of documents don't retain tags when merged. The document 
> [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If 
> you try to merge this with the government [General Forbearance 
> form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library&shortName=general&localeCode=en-us]
>  the output crashes DC when you try to view the tags. If you use a flattened 
> version of the General Forbearance form then the tags are just munged.
> {code}
>     public static void main(String[] args) throws Exception {
>         PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
>         PDDocument src = PDDocument.load(new File("Tagged.pdf"));
>         PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
>         pdfMergerUtility.appendDocument(dest, src);
>         src.close();
>         dest.save(new File("BrokenTags.pdf"));
>         dest.close();
>     }
> {code}
> The included patch appears to make tagging more reliable, but I'm still 
> relying heavily on cloning which can apparently cause other issues.  The 
> documents I get out with this code seem present correctly in Adobe readers 
> for all combinations of documents that I tested against.
> My patch is made and tested against yesterdays production head and it 
> includes my changes from 
> [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is 
> in the exact same place in the code.
> The priority of this is a blocker for 508 compliance of merged documents but 
> I guessed it to be more of a minor issue in the overall scheme of things, 
> please correct me if I am mistaken.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to