[
https://issues.apache.org/jira/browse/PDFBOX-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Costermans updated PDFBOX-2015:
-----------------------------------
Attachment: XRefStm_not_updated.patch
Word2010.pdf
modified_Word2010.pdf
> Hybrid reference pdf still contain XRefStm info in the trailer dictionary
> afterPDDocument#save
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-2015
> URL: https://issues.apache.org/jira/browse/PDFBOX-2015
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.4
> Reporter: Tim Costermans
> Attachments: Word2010.pdf, XRefStm_not_updated.patch,
> modified_Word2010.pdf
>
>
> From: Tim Costermans [mailto:[email protected]]
> Sent: maandag 31 maart 2014 12:57
> To: [email protected]
> Subject: RE: PDFBox 1.8.4 and pdf's generated by MS Word
> Hello,
> I’ve written a test case to reproduce the issue. (see patch)
> Could someone have a look at it and give me some pointers on how to solve
> this issue? I applied this patch on the 1.8.4 tag I checked out locally.
> The issue is that I don’t know the pdf spec, so I don’t know how to fix this
> issue in the PDFBOX source code.
> Word2010.pdf is the input pdf, I open the document with PDFBOX add a string
> to the pdf. In this case ‘Hello world!’.
> Afterwards I save the pdf.
> If I look at the content of the pdf before and after I modified it (using
> Notepad++) I see this:
> Word2010.pdf:
> Line 647: <</Size 18/Root 1 0 R/Info 7 0
> R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE64>]
> /Prev 81972/XRefStm 81702>>
> modified_Word2010.pdf:
> Line 791: /XRefStm 81702
> XRefStm is not updated although the original pdf had multiple revisions that
> were merged into a new pdf document.
> A third party library we use defends on this XRefStm value and cannot open
> the pdf after it was modified. (Stack trace see previous msg)
> Any help would be much appreciated.
> Kind regards,
> Tim Costermans
> Hi Tim,
> that’s a bug.
> Explanation: The original file uses what’s called a hybrid reference. That’s
> for compatibility with readers which do not support compressed reference
> streams. The file generated by PDFBox doesn’t use hybrid references any more
> but still contains the XRefStm info in the trailer dictionary.
> Could you file an issue at https://issues.apache.org/jira/browse/PDFBOX
> BR
> Maruan
--
This message was sent by Atlassian JIRA
(v6.2#6252)