[
https://issues.apache.org/jira/browse/PDFBOX-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065652#comment-18065652
]
Michael Klink edited comment on PDFBOX-6176 at 3/13/26 1:53 PM:
----------------------------------------------------------------
Ok, your original file has an error in its cross reference table, and PDFBox
produces an output with a similar error and another error on top.
h3. In detail
The PDF specification (both the old ISO 32000-1 and the current ISO 32000-2)
requires:
{quote}The cross-reference table (comprising the original cross-reference
section and all update sections) shall contain one entry for each object number
from 0 to the maximum object number defined in the PDF file, even if one or
more of the object numbers in this range do not actually occur in the PDF
file.{quote}
In your original file, though, there are gaps: the maximum object number in it
is 7409 but there are no entries for the object numbers 1..3501, 3715..7402,
and 7404..7406.
PDFBox in your code reads that PDF and stores it anew. While doing so it
changes the cross references from the old table structure to the new stream
structure but keeps the gaps. As the above requirement applies analogously for
cross reference streams, this is already one error in the PDFBox output. But
this can also be considered GIGO (Garbage-in, Garbage-out).
On top, though, PDFBox adds another error, there also is no entry for the cross
reference stream object itself in the cross references. According to spec,
though:
{quote}Like any stream, a cross-reference stream shall be an indirect object.
Therefore, an entry for it shall exist in either a cross-reference stream
(usually itself) or in a cross-reference table{quote}
As the cross reference stream in that file has the highest object number, this
additional error causes the qpdf warning.
----
Both issues most often don't cause any harm when the files are processed by
another PDF processor. One exception are signing use cases, here incomplete
cross references are known to cause validators (in particular Adobe Acrobat) to
reject valid signatures.
was (Author: mkl):
Ok, your original file has an error in its cross reference table, and PDFBox
produces an output with a similar error and another error on top.
h3. In detail
The PDF specification (both the old ISO 32000-1 and the current ISO 32000-2)
requires:
> The cross-reference table (comprising the original cross-reference section
> and all update sections) shall contain one entry for each object number from
> 0 to the maximum object number defined in the PDF file, even if one or more
> of the object numbers in this range do not actually occur in the PDF file.
In your original file, though, there are gaps: the maximum object number in it
is 7409 but there are no entries for the object numbers 1..3501, 3715..7402,
and 7404..7406.
PDFBox in your code reads that PDF and stores it anew. While doing so it
changes the cross references from the old table structure to the new stream
structure but keeps the gaps. As the above requirement applies analogously for
cross reference streams, this is already one error in the PDFBox output. But
this can also be considered GIGO (Garbage-in, Garbage-out).
On top, though, PDFBox adds another error, there also is no entry for the cross
reference stream object itself in the cross references. According to spec,
though:
>Like any stream, a cross-reference stream shall be an indirect object.
>Therefore, an entry for it shall exist in either a cross-reference stream
>(usually itself) or in a cross-reference table
As the cross reference stream in that file has the highest object number, this
additional error causes the qpdf warning.
----
Both issues most often don't cause any harm when the files are processed by
another PDF processor. One exception are signing use cases, here incomplete
cross references are known to cause validators (in particular Adobe Acrobat) to
reject valid signatures.
> reported number of objects (7412) is not one plus the highest object number
> (7410)
> ----------------------------------------------------------------------------------
>
> Key: PDFBOX-6176
> URL: https://issues.apache.org/jira/browse/PDFBOX-6176
> Project: PDFBox
> Issue Type: Bug
> Components: Writing
> Affects Versions: 3.0.0 PDFBox, 3.0.7 PDFBox
> Reporter: Daniel Persson
> Priority: Minor
> Attachments: 180511_A-14.pdf, test.pdf
>
>
> A new customer reported that they got a bunch of errors during a split
> operation in their flow.
> {code:java}
> $ qpdf --split-pages=1 test.pdf page-%d.pdf
> WARNING: test.pdf: reported number of objects (7412) is not one plus the
> highest object number (7410)
> qpdf: operation succeeded with warnings; resulting file may have some
> problems {code}
> Seems I could recreate this issue with a lot of files just by loading and
> saving a PDF.
> {code:java}
> public static void main(String[] args) throws Exception {
> //PDDocument doc = PDDocument.load(new File("180511_A-14.pdf"));
> PDDocument doc = Loader.loadPDF(new File("180511_A-14.pdf"));
> doc.save(new File("test.pdf"));
> } {code}
>
> I've only been able to reproduce the error with 3.0.x not with 2.0.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]