[
https://issues.apache.org/jira/browse/PDFBOX-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Costermans updated PDFBOX-2015:
-----------------------------------
Description:
Word2010.pdf is the input pdf, I open the document with PDFBOX add a string to
the pdf. In this case ‘Hello world!’.
Afterwards I save the pdf.
If I look at the content of the pdf before and after I modified it (using
Notepad++) I see this:
Word2010.pdf:
Line 647: <</Size 18/Root 1 0 R/Info 7 0
R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE64>]
/Prev 81972/XRefStm 81702>>
modified_Word2010.pdf:
Line 791: /XRefStm 81702
XRefStm is not updated although the original pdf had multiple revisions that
were merged into a new pdf document.
A third party library we use defends on this XRefStm value and cannot open the
pdf after it was modified. (Stack trace see previous msg)
Any help would be much appreciated.
Maruan:
that’s a bug.
Explanation: The original file uses what’s called a hybrid reference. That’s
for compatibility with readers which do not support compressed reference
streams. The file generated by PDFBox doesn’t use hybrid references any more
but still contains the XRefStm info in the trailer dictionary.
See
http://mail-archives.apache.org/mod_mbox/pdfbox-users/201403.mbox/%3C4425DF0D5759D64AA8845AA3EC444E1D014AE30AB3%40EXCHANGE03.unifiedpost.com%3E
for more info.
was:
From: Tim Costermans [mailto:[email protected]]
Sent: maandag 31 maart 2014 12:57
To: [email protected]
Subject: RE: PDFBox 1.8.4 and pdf's generated by MS Word
Hello,
I’ve written a test case to reproduce the issue. (see patch)
Could someone have a look at it and give me some pointers on how to solve this
issue? I applied this patch on the 1.8.4 tag I checked out locally.
The issue is that I don’t know the pdf spec, so I don’t know how to fix this
issue in the PDFBOX source code.
Word2010.pdf is the input pdf, I open the document with PDFBOX add a string to
the pdf. In this case ‘Hello world!’.
Afterwards I save the pdf.
If I look at the content of the pdf before and after I modified it (using
Notepad++) I see this:
Word2010.pdf:
Line 647: <</Size 18/Root 1 0 R/Info 7 0
R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE64>]
/Prev 81972/XRefStm 81702>>
modified_Word2010.pdf:
Line 791: /XRefStm 81702
XRefStm is not updated although the original pdf had multiple revisions that
were merged into a new pdf document.
A third party library we use defends on this XRefStm value and cannot open the
pdf after it was modified. (Stack trace see previous msg)
Any help would be much appreciated.
Kind regards,
Tim Costermans
Hi Tim,
that’s a bug.
Explanation: The original file uses what’s called a hybrid reference. That’s
for compatibility with readers which do not support compressed reference
streams. The file generated by PDFBox doesn’t use hybrid references any more
but still contains the XRefStm info in the trailer dictionary.
Could you file an issue at https://issues.apache.org/jira/browse/PDFBOX
BR
Maruan
> Hybrid reference pdf still contain XRefStm info in the trailer dictionary
> afterPDDocument#save
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-2015
> URL: https://issues.apache.org/jira/browse/PDFBOX-2015
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.4
> Reporter: Tim Costermans
> Attachments: Word2010.pdf, XRefStm_not_updated.patch,
> modified_Word2010.pdf
>
>
> Word2010.pdf is the input pdf, I open the document with PDFBOX add a string
> to the pdf. In this case ‘Hello world!’.
> Afterwards I save the pdf.
> If I look at the content of the pdf before and after I modified it (using
> Notepad++) I see this:
> Word2010.pdf:
> Line 647: <</Size 18/Root 1 0 R/Info 7 0
> R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE64>]
> /Prev 81972/XRefStm 81702>>
> modified_Word2010.pdf:
> Line 791: /XRefStm 81702
> XRefStm is not updated although the original pdf had multiple revisions that
> were merged into a new pdf document.
> A third party library we use defends on this XRefStm value and cannot open
> the pdf after it was modified. (Stack trace see previous msg)
> Any help would be much appreciated.
> Maruan:
> that’s a bug.
> Explanation: The original file uses what’s called a hybrid reference. That’s
> for compatibility with readers which do not support compressed reference
> streams. The file generated by PDFBox doesn’t use hybrid references any more
> but still contains the XRefStm info in the trailer dictionary.
> See
> http://mail-archives.apache.org/mod_mbox/pdfbox-users/201403.mbox/%3C4425DF0D5759D64AA8845AA3EC444E1D014AE30AB3%40EXCHANGE03.unifiedpost.com%3E
> for more info.
--
This message was sent by Atlassian JIRA
(v6.2#6252)