[ 
https://issues.apache.org/jira/browse/PDFBOX-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

W B updated PDFBOX-2440:
------------------------
    Description: 
When saving a PDDocument, PdfBox seems to always write an xref table, even when 
the original file contains an xref stream.

To reproduce, load a PDF file (like the one attached) with PDDocument#load (or 
PDDocument#loadNonSeq, same result) and then save it with PDDocument#save to 
another file.

It seems to me that the problem is in COSWriter#doWriteXRef. When 
COSDocument#isXRefStream is true, the xref entries should be wrapped in a 
stream, but they're written to output one by one. I think that part should look 
more like its counterpart in COSWriter#doWriteXRefInc.

I made some changes to doWriteXRef accordingly and it seems to work for PDFs 
that have never been incrementally updated but leads to corrupt files when the 
PDF has been incrementally updated before :(

  was:
When saving a PDDocument, PdfBox seems to always write an xref table, even when 
the original file contains an xref stream.

To reproduce, load a PDF file (like the one attached) with PDDocument#load (or 
PDDocument#loadNonSeq, same result) and then save it with PDDocument#save to 
another file.

It seems to me that the problem is in COSWriter#doWriteXRef. When 
doc#isXRefStream is true, the xref entries should be wrapped in a stream, but 
they're written to output one by one. I think that part should look more like 
its counterpart in COSWriter#doWriteXRefInc.

I made some changes to doWriteXRef accordingly and it seems to work for PDFs 
that have never been incrementally updated but leads to corrupt files when the 
PDF has been incrementally updated before :(


> xref stream is saved as table
> -----------------------------
>
>                 Key: PDFBOX-2440
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2440
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 1.8.7
>            Reporter: W B
>         Attachments: COSWriter.diff, lipsum.pdf
>
>
> When saving a PDDocument, PdfBox seems to always write an xref table, even 
> when the original file contains an xref stream.
> To reproduce, load a PDF file (like the one attached) with PDDocument#load 
> (or PDDocument#loadNonSeq, same result) and then save it with PDDocument#save 
> to another file.
> It seems to me that the problem is in COSWriter#doWriteXRef. When 
> COSDocument#isXRefStream is true, the xref entries should be wrapped in a 
> stream, but they're written to output one by one. I think that part should 
> look more like its counterpart in COSWriter#doWriteXRefInc.
> I made some changes to doWriteXRef accordingly and it seems to work for PDFs 
> that have never been incrementally updated but leads to corrupt files when 
> the PDF has been incrementally updated before :(



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to