[
https://issues.apache.org/jira/browse/PDFBOX-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259001#comment-14259001
]
Tilman Hausherr edited comment on PDFBOX-2440 at 12/26/14 10:14 AM:
--------------------------------------------------------------------
Sorry, but this doesn't work for many files: saved files are now corrupt and
can't be reopened by PDFBox or by Adobe Reader. I'll add an example and will
commit the modification I made here to test this.
{code}
SCHWERWIEGEND: Error converting file asy-functionshading.pdf
java.lang.NullPointerException
at
org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:91)
at
org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:900)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:749)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXref(NonSequentialPDFParser.java:711)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:480)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:956)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:941)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:887)
{code}
Complete list:
annots.pdf
asy-functionshading.pdf
color_gradient.pdf
genko_oc_shiryo1.pdf
gs-bugzilla691021.pdf
gs-bugzilla691118.pdf
gs-bugzilla693027.pdf
gs-bugzilla694310.pdf
gs-bugzilla694385.pdf
K_UPMEYER_SPRING10.pdf
PDFBOX-1128.pdf
PDFBOX-1292.pdf
PDFBOX-1537.pdf
PDFBOX-1538.pdf
PDFBOX-1686-test1.pdf
PDFBOX-1686-test2.pdf
PDFBOX-1691-FORIS-HV.pdf
PDFBOX-1727.pdf
PDFBOX-1743.pdf
PDFBOX-1772.pdf
PDFBOX-1809.pdf
PDFBOX-1900-checkbox.pdf
PDFBOX-2001.pdf
PDFBOX-2023-zerofontheight.pdf
PDFBOX-2024-rot180.pdf
PDFBOX-2046.pdf
PDFBOX-2059-zerowidth.pdf
PDFBOX-2093-bullets.pdf
PDFBOX-2158.pdf
PDFBOX-2241.pdf
PDFBOX-2282-signature.pdf
PDFBOX-2359.pdf
PDFBOX-2398.pdf
PDFBOX-2579-confidential.pdf
PDFBOX-869-cloud.pdf
PDFBOX-988.pdf
PDFBOX-993-transparent-image.pdf
was (Author: tilman):
Sorry, but this doesn't work for many files: saved files are now corrupt and
can't be reopened by PDFBox or by Adobe Reader. I'll add an example and will
commit the modification I made here to test this.
{code}
SCHWERWIEGEND: Error converting file asy-functionshading.pdf
java.lang.NullPointerException
at
org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:91)
at
org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:900)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:749)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXref(NonSequentialPDFParser.java:711)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:480)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:956)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:941)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:887)
{code}
> xref stream is saved as table
> -----------------------------
>
> Key: PDFBOX-2440
> URL: https://issues.apache.org/jira/browse/PDFBOX-2440
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 1.8.7
> Reporter: WB
> Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: COSWriter.diff, asy-functionshading.pdf, lipsum.pdf,
> pdfbox4177349906426869579.pdf
>
>
> When saving a PDDocument, PdfBox seems to always write an xref table, even
> when the original file contains an xref stream.
> To reproduce, load a PDF file (like the one attached) with PDDocument#load
> (or PDDocument#loadNonSeq, same result) and then save it with PDDocument#save
> to another file.
> It seems to me that the problem is in COSWriter#doWriteXRef. When
> COSDocument#isXRefStream is true, the xref entries should be wrapped in a
> stream, but they're written to output one by one. I think that part should
> look more like its counterpart in COSWriter#doWriteXRefInc.
> I made some changes to doWriteXRef accordingly and it seems to work for PDFs
> that have never been incrementally updated but leads to corrupt files when
> the PDF has been incrementally updated before :(
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)