Andreas Lehmkühler created PDFBOX-5945:
------------------------------------------

             Summary: Wrong size entry in trailer after incremental save
                 Key: PDFBOX-5945
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5945
             Project: PDFBox
          Issue Type: Bug
          Components: Writing
    Affects Versions: 3.0.4 PDFBox
            Reporter: Andreas Lehmkühler
            Assignee: Andreas Lehmkühler


Lahcen FTIH reports the following issue on users@pdfbox
 
*Issue Description*
When performing an incremental update on a PDF document that has a
cross-reference table (not a cross-reference stream) without creating any
new object references, the *trailer’s size entry* is set incorrectly.
Specifically, it becomes 1 greater than the highest object number in the
*current* trailer—rather than 1 greater than the highest object number in
the *entire* document. This leads to problems because:

1. Some PDF editors determine the next reference to use based on this
size value, leading to inconsistencies.
2. Some PDF readers may consider the file corrupt since, according to
the PDF specification, any object in a cross-reference section whose number
is greater than the *size* value “shall be ignored and defined to be
missing.”

Below is the minimal code example to reproduce this issue. As you can see
in the log output, the input PDF has a last trailer size of 10, but after
the incremental save, the last trailer size becomes 6, even though the
highest object number remains 9.

{code:java}
package org.apache.pdfbox;
import org.apache.pdfbox.cos.COSDocument;import
org.apache.pdfbox.cos.COSName;import
org.apache.pdfbox.cos.COSObjectKey;import
org.apache.pdfbox.pdfwriter.compress.CompressParameters;import
org.apache.pdfbox.pdmodel.PDDocument;import
org.apache.pdfbox.pdmodel.PDPage;import
org.apache.pdfbox.pdmodel.PDResources;import
org.apache.pdfbox.pdmodel.common.PDRectangle;import
org.apache.pdfbox.pdmodel.font.PDFont;import
org.apache.pdfbox.pdmodel.font.PDType1Font;import
org.apache.pdfbox.pdmodel.font.Standard14Fonts;import
org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;import
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;import
org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
import java.io.ByteArrayOutputStream;import java.io.IOException;
public class ReproduceBug {
    private static final String FIELD_NAME = "textFieldName";

    public static void main(String[] args) {
        byte[] input = create();
        printInfo(input, "input");
        byte[] output = edit(input);
        printInfo(output, "output");
    }

    private static byte[] create() {
        try (PDDocument pdDocument = new PDDocument()) {
            PDAcroForm acroForm = addAcroForm(pdDocument);
            addFonts(acroForm);
            PDPage page = addPage(pdDocument);
            addTextField(acroForm, page);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.save(out, CompressParameters.NO_COMPRESSION);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static byte[] edit(byte[] input) {
        try (PDDocument pdDocument = loadPDF(input)) {
            PDTextField textField = (PDTextField)
pdDocument.getDocumentCatalog().getAcroForm().getField(FIELD_NAME);
            assert textField != null;
            textField.setMultiline(true);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.saveIncremental(out);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static PDDocument loadPDF(byte[] content) throws IOException {
        return org.apache.pdfbox.Loader.loadPDF(content);
    }

    private static void addFonts(PDAcroForm acroForm) throws IOException {
        PDFont font1 = new PDType1Font(Standard14Fonts.FontName.HELVETICA);
        PDFont font2 = new PDType1Font(Standard14Fonts.FontName.ZAPF_DINGBATS);
        PDResources resources = new PDResources();
        resources.put(COSName.getPDFName("Helv"), font1);
        resources.put(COSName.getPDFName("ZaDb"), font2);
        acroForm.setDefaultResources(resources);
    }

    private static PDAcroForm addAcroForm(PDDocument pdDocument)
throws IOException {
        PDAcroForm acroForm = new PDAcroForm(pdDocument);
        pdDocument.getDocumentCatalog().setAcroForm(acroForm);
        return acroForm;
    }

    private static PDPage addPage(PDDocument pdDocument) throws IOException {
        PDPage page = new PDPage(PDRectangle.A4);
        pdDocument.addPage(page);
        return page;
    }

    private static void addTextField(PDAcroForm acroForm, PDPage page)
throws IOException {
        PDTextField textField = new PDTextField(acroForm);
        textField.setPartialName(FIELD_NAME);
        acroForm.getFields().add(textField);
        PDAnnotationWidget widget = textField.getWidgets().get(0);
        widget.setPage(page);
        page.getAnnotations().add(widget);
        PDRectangle rectangle = new PDRectangle(10, 200, 200, 15);
        widget.setRectangle(rectangle);
    }

    private static void printInfo(byte[] content, String name) {
        
System.out.println("------------------------------------------------------");
        System.out.println("-- " + name);
        try (PDDocument pdDocument = loadPDF(content)) {
            COSDocument cosDocument = pdDocument.getDocument();
            System.out.println("getXrefTable.keySet.maxNumber: " +
                cosDocument.getXrefTable().keySet().stream()
                           .mapToLong(COSObjectKey::getNumber)
                           .max().getAsLong());
            System.out.println("lastTrailer.size: " +
                cosDocument.getTrailer().getLong(COSName.SIZE));
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        
System.out.println("------------------------------------------------------");
    }
}
{code}

*Console Output*

------------------------------------------------------
-- input
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 10
------------------------------------------------------** lastEntry is: 5
------------------------------------------------------
-- output
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 6
------------------------------------------------------

As the output shows:

   - The highest object number remains 9.
   - The final trailer size is incorrectly set to 6 after the incremental
   save, whereas it should remain 10.

*Impact*

   - PDF editors that use the trailer’s size to determine the next free
   object reference can fail to handle the PDF correctly.
   - Some readers may reject or treat the PDF as corrupted if they strictly
   follow the specification about ignored objects.

*Request*
Could you please confirm whether this is indeed a regression? I would
appreciate any workaround suggestions or information on whether a fix is
planned in upcoming releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to