[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721174#comment-13721174
]
Kirk Haines commented on PDFBOX-1511:
-------------------------------------
I have also experienced this (Windows 7, Java 1.6.0_35-b10 64-bit) in PDFBox
1.7.1 thru the current trunk. I tried Maruan's suggestion and it resolved the
issue, at the expense of creating unnecessary duplicate resources. I had
noticed that the corruption in subsequent documents resulted in those pages
having their formatting preserved, but the text content had many letters
substituted (all 'd' replaced by 'f', all 'y' replaced by 'd', etc.) I also
found that the degree of corruption depended on how similar the beginning text
content of each input document was. When there was a common header in the
documents being merged, there were only a few substitutions. When it was
merging a document with itself, there were no errors. When the document header
was very different, the resulting text was undecipherable garbage. This made
me suspect that it may be a problem with the deflate compression being applied
to the stream. I thought that it might be using the (compression) dictionary
from the first document and copying the physical bytes from the source document
rather than the reading the logical bytes and allowing the deflate filter in
the context of the destination document to re-encode them.
> pdfMerger App produces Garbage
> ------------------------------
>
> Key: PDFBOX-1511
> URL: https://issues.apache.org/jira/browse/PDFBOX-1511
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.7.1
> Environment: Win XP; Windows Server 2008 R2; java version "1.6.0_21",
> Reporter: Michael Huber
> Attachments: 1.pdf, 2.pdf, PdfRenderer.java, targetPdfMergeJava.pdf,
> targetPdfMergeUtilityApp.pdf
>
>
> pdfbox Utility pdfMerger produces a merged document containing garbage. All
> merged pdf files are contained but Strings are destroyed.
> The source pdf files are created with graphviz and are readable without error
> or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
> Another astoundig thing is that a handcoded merger using pdfMergerUtility
> class works fine when run within Eclipse Juno and creates same garbage when
> run from cmd line (pls. see attached source)
> I checked everything that comes in mind to find the differences, e.g. Java
> version, encoding/codepage issues, memory settings, found nothing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira