[ 
https://issues.apache.org/jira/browse/PDFBOX-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5945:
---------------------------------------
    Description: 
Lahcen FTIH reports the following issue on 
[users@pdfbox|https://lists.apache.org/thread/23f3pzwntt1dhr7vq131d29k29k277gc]
 
*Issue Description*
When performing an incremental update on a PDF document that has a
cross-reference table (not a cross-reference stream) without creating any
new object references, the *trailer’s size entry* is set incorrectly.
Specifically, it becomes 1 greater than the highest object number in the
*current* trailer—rather than 1 greater than the highest object number in
the *entire* document. This leads to problems because:

1. Some PDF editors determine the next reference to use based on this
size value, leading to inconsistencies.
2. Some PDF readers may consider the file corrupt since, according to
the PDF specification, any object in a cross-reference section whose number
is greater than the *size* value “shall be ignored and defined to be
missing.”

Below is the minimal code example to reproduce this issue. As you can see
in the log output, the input PDF has a last trailer size of 10, but after
the incremental save, the last trailer size becomes 6, even though the
highest object number remains 9.

{code:java}
package org.apache.pdfbox;
import org.apache.pdfbox.cos.COSDocument;import
org.apache.pdfbox.cos.COSName;import
org.apache.pdfbox.cos.COSObjectKey;import
org.apache.pdfbox.pdfwriter.compress.CompressParameters;import
org.apache.pdfbox.pdmodel.PDDocument;import
org.apache.pdfbox.pdmodel.PDPage;import
org.apache.pdfbox.pdmodel.PDResources;import
org.apache.pdfbox.pdmodel.common.PDRectangle;import
org.apache.pdfbox.pdmodel.font.PDFont;import
org.apache.pdfbox.pdmodel.font.PDType1Font;import
org.apache.pdfbox.pdmodel.font.Standard14Fonts;import
org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;import
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;import
org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
import java.io.ByteArrayOutputStream;import java.io.IOException;
public class ReproduceBug {
    private static final String FIELD_NAME = "textFieldName";

    public static void main(String[] args) {
        byte[] input = create();
        printInfo(input, "input");
        byte[] output = edit(input);
        printInfo(output, "output");
    }

    private static byte[] create() {
        try (PDDocument pdDocument = new PDDocument()) {
            PDAcroForm acroForm = addAcroForm(pdDocument);
            addFonts(acroForm);
            PDPage page = addPage(pdDocument);
            addTextField(acroForm, page);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.save(out, CompressParameters.NO_COMPRESSION);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static byte[] edit(byte[] input) {
        try (PDDocument pdDocument = loadPDF(input)) {
            PDTextField textField = (PDTextField)
pdDocument.getDocumentCatalog().getAcroForm().getField(FIELD_NAME);
            assert textField != null;
            textField.setMultiline(true);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.saveIncremental(out);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static PDDocument loadPDF(byte[] content) throws IOException {
        return org.apache.pdfbox.Loader.loadPDF(content);
    }

    private static void addFonts(PDAcroForm acroForm) throws IOException {
        PDFont font1 = new PDType1Font(Standard14Fonts.FontName.HELVETICA);
        PDFont font2 = new PDType1Font(Standard14Fonts.FontName.ZAPF_DINGBATS);
        PDResources resources = new PDResources();
        resources.put(COSName.getPDFName("Helv"), font1);
        resources.put(COSName.getPDFName("ZaDb"), font2);
        acroForm.setDefaultResources(resources);
    }

    private static PDAcroForm addAcroForm(PDDocument pdDocument)
throws IOException {
        PDAcroForm acroForm = new PDAcroForm(pdDocument);
        pdDocument.getDocumentCatalog().setAcroForm(acroForm);
        return acroForm;
    }

    private static PDPage addPage(PDDocument pdDocument) throws IOException {
        PDPage page = new PDPage(PDRectangle.A4);
        pdDocument.addPage(page);
        return page;
    }

    private static void addTextField(PDAcroForm acroForm, PDPage page)
throws IOException {
        PDTextField textField = new PDTextField(acroForm);
        textField.setPartialName(FIELD_NAME);
        acroForm.getFields().add(textField);
        PDAnnotationWidget widget = textField.getWidgets().get(0);
        widget.setPage(page);
        page.getAnnotations().add(widget);
        PDRectangle rectangle = new PDRectangle(10, 200, 200, 15);
        widget.setRectangle(rectangle);
    }

    private static void printInfo(byte[] content, String name) {
        
System.out.println("------------------------------------------------------");
        System.out.println("-- " + name);
        try (PDDocument pdDocument = loadPDF(content)) {
            COSDocument cosDocument = pdDocument.getDocument();
            System.out.println("getXrefTable.keySet.maxNumber: " +
                cosDocument.getXrefTable().keySet().stream()
                           .mapToLong(COSObjectKey::getNumber)
                           .max().getAsLong());
            System.out.println("lastTrailer.size: " +
                cosDocument.getTrailer().getLong(COSName.SIZE));
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        
System.out.println("------------------------------------------------------");
    }
}
{code}

*Console Output*

------------------------------------------------------
-- input
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 10
------------------------------------------------------** lastEntry is: 5
------------------------------------------------------
-- output
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 6
------------------------------------------------------

As the output shows:

   - The highest object number remains 9.
   - The final trailer size is incorrectly set to 6 after the incremental
   save, whereas it should remain 10.

*Impact*

   - PDF editors that use the trailer’s size to determine the next free
   object reference can fail to handle the PDF correctly.
   - Some readers may reject or treat the PDF as corrupted if they strictly
   follow the specification about ignored objects.

*Request*
Could you please confirm whether this is indeed a regression? I would
appreciate any workaround suggestions or information on whether a fix is
planned in upcoming releases.

  was:
Lahcen FTIH reports the following issue on users@pdfbox
 
*Issue Description*
When performing an incremental update on a PDF document that has a
cross-reference table (not a cross-reference stream) without creating any
new object references, the *trailer’s size entry* is set incorrectly.
Specifically, it becomes 1 greater than the highest object number in the
*current* trailer—rather than 1 greater than the highest object number in
the *entire* document. This leads to problems because:

1. Some PDF editors determine the next reference to use based on this
size value, leading to inconsistencies.
2. Some PDF readers may consider the file corrupt since, according to
the PDF specification, any object in a cross-reference section whose number
is greater than the *size* value “shall be ignored and defined to be
missing.”

Below is the minimal code example to reproduce this issue. As you can see
in the log output, the input PDF has a last trailer size of 10, but after
the incremental save, the last trailer size becomes 6, even though the
highest object number remains 9.

{code:java}
package org.apache.pdfbox;
import org.apache.pdfbox.cos.COSDocument;import
org.apache.pdfbox.cos.COSName;import
org.apache.pdfbox.cos.COSObjectKey;import
org.apache.pdfbox.pdfwriter.compress.CompressParameters;import
org.apache.pdfbox.pdmodel.PDDocument;import
org.apache.pdfbox.pdmodel.PDPage;import
org.apache.pdfbox.pdmodel.PDResources;import
org.apache.pdfbox.pdmodel.common.PDRectangle;import
org.apache.pdfbox.pdmodel.font.PDFont;import
org.apache.pdfbox.pdmodel.font.PDType1Font;import
org.apache.pdfbox.pdmodel.font.Standard14Fonts;import
org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;import
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;import
org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
import java.io.ByteArrayOutputStream;import java.io.IOException;
public class ReproduceBug {
    private static final String FIELD_NAME = "textFieldName";

    public static void main(String[] args) {
        byte[] input = create();
        printInfo(input, "input");
        byte[] output = edit(input);
        printInfo(output, "output");
    }

    private static byte[] create() {
        try (PDDocument pdDocument = new PDDocument()) {
            PDAcroForm acroForm = addAcroForm(pdDocument);
            addFonts(acroForm);
            PDPage page = addPage(pdDocument);
            addTextField(acroForm, page);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.save(out, CompressParameters.NO_COMPRESSION);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static byte[] edit(byte[] input) {
        try (PDDocument pdDocument = loadPDF(input)) {
            PDTextField textField = (PDTextField)
pdDocument.getDocumentCatalog().getAcroForm().getField(FIELD_NAME);
            assert textField != null;
            textField.setMultiline(true);
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            pdDocument.saveIncremental(out);
            return out.toByteArray();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private static PDDocument loadPDF(byte[] content) throws IOException {
        return org.apache.pdfbox.Loader.loadPDF(content);
    }

    private static void addFonts(PDAcroForm acroForm) throws IOException {
        PDFont font1 = new PDType1Font(Standard14Fonts.FontName.HELVETICA);
        PDFont font2 = new PDType1Font(Standard14Fonts.FontName.ZAPF_DINGBATS);
        PDResources resources = new PDResources();
        resources.put(COSName.getPDFName("Helv"), font1);
        resources.put(COSName.getPDFName("ZaDb"), font2);
        acroForm.setDefaultResources(resources);
    }

    private static PDAcroForm addAcroForm(PDDocument pdDocument)
throws IOException {
        PDAcroForm acroForm = new PDAcroForm(pdDocument);
        pdDocument.getDocumentCatalog().setAcroForm(acroForm);
        return acroForm;
    }

    private static PDPage addPage(PDDocument pdDocument) throws IOException {
        PDPage page = new PDPage(PDRectangle.A4);
        pdDocument.addPage(page);
        return page;
    }

    private static void addTextField(PDAcroForm acroForm, PDPage page)
throws IOException {
        PDTextField textField = new PDTextField(acroForm);
        textField.setPartialName(FIELD_NAME);
        acroForm.getFields().add(textField);
        PDAnnotationWidget widget = textField.getWidgets().get(0);
        widget.setPage(page);
        page.getAnnotations().add(widget);
        PDRectangle rectangle = new PDRectangle(10, 200, 200, 15);
        widget.setRectangle(rectangle);
    }

    private static void printInfo(byte[] content, String name) {
        
System.out.println("------------------------------------------------------");
        System.out.println("-- " + name);
        try (PDDocument pdDocument = loadPDF(content)) {
            COSDocument cosDocument = pdDocument.getDocument();
            System.out.println("getXrefTable.keySet.maxNumber: " +
                cosDocument.getXrefTable().keySet().stream()
                           .mapToLong(COSObjectKey::getNumber)
                           .max().getAsLong());
            System.out.println("lastTrailer.size: " +
                cosDocument.getTrailer().getLong(COSName.SIZE));
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        
System.out.println("------------------------------------------------------");
    }
}
{code}

*Console Output*

------------------------------------------------------
-- input
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 10
------------------------------------------------------** lastEntry is: 5
------------------------------------------------------
-- output
getXrefTable.keySet.maxNumber: 9lastTrailer.size: 6
------------------------------------------------------

As the output shows:

   - The highest object number remains 9.
   - The final trailer size is incorrectly set to 6 after the incremental
   save, whereas it should remain 10.

*Impact*

   - PDF editors that use the trailer’s size to determine the next free
   object reference can fail to handle the PDF correctly.
   - Some readers may reject or treat the PDF as corrupted if they strictly
   follow the specification about ignored objects.

*Request*
Could you please confirm whether this is indeed a regression? I would
appreciate any workaround suggestions or information on whether a fix is
planned in upcoming releases.


> Wrong size entry in trailer after incremental save
> --------------------------------------------------
>
>                 Key: PDFBOX-5945
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5945
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 2.0.33, 3.0.4 PDFBox, 4.0.0
>            Reporter: Andreas Lehmkühler
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
>
> Lahcen FTIH reports the following issue on 
> [users@pdfbox|https://lists.apache.org/thread/23f3pzwntt1dhr7vq131d29k29k277gc]
>  
> *Issue Description*
> When performing an incremental update on a PDF document that has a
> cross-reference table (not a cross-reference stream) without creating any
> new object references, the *trailer’s size entry* is set incorrectly.
> Specifically, it becomes 1 greater than the highest object number in the
> *current* trailer—rather than 1 greater than the highest object number in
> the *entire* document. This leads to problems because:
> 1. Some PDF editors determine the next reference to use based on this
> size value, leading to inconsistencies.
> 2. Some PDF readers may consider the file corrupt since, according to
> the PDF specification, any object in a cross-reference section whose number
> is greater than the *size* value “shall be ignored and defined to be
> missing.”
> Below is the minimal code example to reproduce this issue. As you can see
> in the log output, the input PDF has a last trailer size of 10, but after
> the incremental save, the last trailer size becomes 6, even though the
> highest object number remains 9.
> {code:java}
> package org.apache.pdfbox;
> import org.apache.pdfbox.cos.COSDocument;import
> org.apache.pdfbox.cos.COSName;import
> org.apache.pdfbox.cos.COSObjectKey;import
> org.apache.pdfbox.pdfwriter.compress.CompressParameters;import
> org.apache.pdfbox.pdmodel.PDDocument;import
> org.apache.pdfbox.pdmodel.PDPage;import
> org.apache.pdfbox.pdmodel.PDResources;import
> org.apache.pdfbox.pdmodel.common.PDRectangle;import
> org.apache.pdfbox.pdmodel.font.PDFont;import
> org.apache.pdfbox.pdmodel.font.PDType1Font;import
> org.apache.pdfbox.pdmodel.font.Standard14Fonts;import
> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;import
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;import
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
> import java.io.ByteArrayOutputStream;import java.io.IOException;
> public class ReproduceBug {
>     private static final String FIELD_NAME = "textFieldName";
>     public static void main(String[] args) {
>         byte[] input = create();
>         printInfo(input, "input");
>         byte[] output = edit(input);
>         printInfo(output, "output");
>     }
>     private static byte[] create() {
>         try (PDDocument pdDocument = new PDDocument()) {
>             PDAcroForm acroForm = addAcroForm(pdDocument);
>             addFonts(acroForm);
>             PDPage page = addPage(pdDocument);
>             addTextField(acroForm, page);
>             ByteArrayOutputStream out = new ByteArrayOutputStream();
>             pdDocument.save(out, CompressParameters.NO_COMPRESSION);
>             return out.toByteArray();
>         } catch (IOException e) {
>             throw new RuntimeException(e);
>         }
>     }
>     private static byte[] edit(byte[] input) {
>         try (PDDocument pdDocument = loadPDF(input)) {
>             PDTextField textField = (PDTextField)
> pdDocument.getDocumentCatalog().getAcroForm().getField(FIELD_NAME);
>             assert textField != null;
>             textField.setMultiline(true);
>             ByteArrayOutputStream out = new ByteArrayOutputStream();
>             pdDocument.saveIncremental(out);
>             return out.toByteArray();
>         } catch (IOException e) {
>             throw new RuntimeException(e);
>         }
>     }
>     private static PDDocument loadPDF(byte[] content) throws IOException {
>         return org.apache.pdfbox.Loader.loadPDF(content);
>     }
>     private static void addFonts(PDAcroForm acroForm) throws IOException {
>         PDFont font1 = new PDType1Font(Standard14Fonts.FontName.HELVETICA);
>         PDFont font2 = new 
> PDType1Font(Standard14Fonts.FontName.ZAPF_DINGBATS);
>         PDResources resources = new PDResources();
>         resources.put(COSName.getPDFName("Helv"), font1);
>         resources.put(COSName.getPDFName("ZaDb"), font2);
>         acroForm.setDefaultResources(resources);
>     }
>     private static PDAcroForm addAcroForm(PDDocument pdDocument)
> throws IOException {
>         PDAcroForm acroForm = new PDAcroForm(pdDocument);
>         pdDocument.getDocumentCatalog().setAcroForm(acroForm);
>         return acroForm;
>     }
>     private static PDPage addPage(PDDocument pdDocument) throws IOException {
>         PDPage page = new PDPage(PDRectangle.A4);
>         pdDocument.addPage(page);
>         return page;
>     }
>     private static void addTextField(PDAcroForm acroForm, PDPage page)
> throws IOException {
>         PDTextField textField = new PDTextField(acroForm);
>         textField.setPartialName(FIELD_NAME);
>         acroForm.getFields().add(textField);
>         PDAnnotationWidget widget = textField.getWidgets().get(0);
>         widget.setPage(page);
>         page.getAnnotations().add(widget);
>         PDRectangle rectangle = new PDRectangle(10, 200, 200, 15);
>         widget.setRectangle(rectangle);
>     }
>     private static void printInfo(byte[] content, String name) {
>         
> System.out.println("------------------------------------------------------");
>         System.out.println("-- " + name);
>         try (PDDocument pdDocument = loadPDF(content)) {
>             COSDocument cosDocument = pdDocument.getDocument();
>             System.out.println("getXrefTable.keySet.maxNumber: " +
>                 cosDocument.getXrefTable().keySet().stream()
>                            .mapToLong(COSObjectKey::getNumber)
>                            .max().getAsLong());
>             System.out.println("lastTrailer.size: " +
>                 cosDocument.getTrailer().getLong(COSName.SIZE));
>         } catch (IOException e) {
>             throw new RuntimeException(e);
>         }
>         
> System.out.println("------------------------------------------------------");
>     }
> }
> {code}
> *Console Output*
> ------------------------------------------------------
> -- input
> getXrefTable.keySet.maxNumber: 9lastTrailer.size: 10
> ------------------------------------------------------** lastEntry is: 5
> ------------------------------------------------------
> -- output
> getXrefTable.keySet.maxNumber: 9lastTrailer.size: 6
> ------------------------------------------------------
> As the output shows:
>    - The highest object number remains 9.
>    - The final trailer size is incorrectly set to 6 after the incremental
>    save, whereas it should remain 10.
> *Impact*
>    - PDF editors that use the trailer’s size to determine the next free
>    object reference can fail to handle the PDF correctly.
>    - Some readers may reject or treat the PDF as corrupted if they strictly
>    follow the specification about ignored objects.
> *Request*
> Could you please confirm whether this is indeed a regression? I would
> appreciate any workaround suggestions or information on whether a fix is
> planned in upcoming releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to