[ https://issues.apache.org/jira/browse/PDFBOX-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259001#comment-14259001 ]
Tilman Hausherr edited comment on PDFBOX-2440 at 12/26/14 9:05 AM: ------------------------------------------------------------------- Sorry, but this doesn't work for many files: saved files are now corrupt and can't be reopened by PDFBox or by Adobe Reader. I'll add an example and will commit the modification I made here to test this. {code} SCHWERWIEGEND: Error converting file asy-functionshading.pdf java.lang.NullPointerException at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:91) at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:900) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:749) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXref(NonSequentialPDFParser.java:711) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:480) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:956) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:941) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:887) {code} was (Author: tilman): Sorry, but this doesn't work for many files: saved files are now corrupt and can't be reopened by PDFBox or by Adobe Reader. I'll add an example and will commit the modification I made here to test this. > xref stream is saved as table > ----------------------------- > > Key: PDFBOX-2440 > URL: https://issues.apache.org/jira/browse/PDFBOX-2440 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 1.8.7 > Reporter: WB > Assignee: Andreas Lehmkühler > Fix For: 2.0.0 > > Attachments: COSWriter.diff, asy-functionshading.pdf, lipsum.pdf, > pdfbox4177349906426869579.pdf > > > When saving a PDDocument, PdfBox seems to always write an xref table, even > when the original file contains an xref stream. > To reproduce, load a PDF file (like the one attached) with PDDocument#load > (or PDDocument#loadNonSeq, same result) and then save it with PDDocument#save > to another file. > It seems to me that the problem is in COSWriter#doWriteXRef. When > COSDocument#isXRefStream is true, the xref entries should be wrapped in a > stream, but they're written to output one by one. I think that part should > look more like its counterpart in COSWriter#doWriteXRefInc. > I made some changes to doWriteXRef accordingly and it seems to work for PDFs > that have never been incrementally updated but leads to corrupt files when > the PDF has been incrementally updated before :( -- This message was sent by Atlassian JIRA (v6.3.4#6332)