Andreas Lehmkühler created PDFBOX-5945: ------------------------------------------
Summary: Wrong size entry in trailer after incremental save Key: PDFBOX-5945 URL: https://issues.apache.org/jira/browse/PDFBOX-5945 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 3.0.4 PDFBox Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Lahcen FTIH reports the following issue on users@pdfbox *Issue Description* When performing an incremental update on a PDF document that has a cross-reference table (not a cross-reference stream) without creating any new object references, the *trailer’s size entry* is set incorrectly. Specifically, it becomes 1 greater than the highest object number in the *current* trailer—rather than 1 greater than the highest object number in the *entire* document. This leads to problems because: 1. Some PDF editors determine the next reference to use based on this size value, leading to inconsistencies. 2. Some PDF readers may consider the file corrupt since, according to the PDF specification, any object in a cross-reference section whose number is greater than the *size* value “shall be ignored and defined to be missing.” Below is the minimal code example to reproduce this issue. As you can see in the log output, the input PDF has a last trailer size of 10, but after the incremental save, the last trailer size becomes 6, even though the highest object number remains 9. {code:java} package org.apache.pdfbox; import org.apache.pdfbox.cos.COSDocument;import org.apache.pdfbox.cos.COSName;import org.apache.pdfbox.cos.COSObjectKey;import org.apache.pdfbox.pdfwriter.compress.CompressParameters;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.pdmodel.PDPage;import org.apache.pdfbox.pdmodel.PDResources;import org.apache.pdfbox.pdmodel.common.PDRectangle;import org.apache.pdfbox.pdmodel.font.PDFont;import org.apache.pdfbox.pdmodel.font.PDType1Font;import org.apache.pdfbox.pdmodel.font.Standard14Fonts;import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;import org.apache.pdfbox.pdmodel.interactive.form.PDTextField; import java.io.ByteArrayOutputStream;import java.io.IOException; public class ReproduceBug { private static final String FIELD_NAME = "textFieldName"; public static void main(String[] args) { byte[] input = create(); printInfo(input, "input"); byte[] output = edit(input); printInfo(output, "output"); } private static byte[] create() { try (PDDocument pdDocument = new PDDocument()) { PDAcroForm acroForm = addAcroForm(pdDocument); addFonts(acroForm); PDPage page = addPage(pdDocument); addTextField(acroForm, page); ByteArrayOutputStream out = new ByteArrayOutputStream(); pdDocument.save(out, CompressParameters.NO_COMPRESSION); return out.toByteArray(); } catch (IOException e) { throw new RuntimeException(e); } } private static byte[] edit(byte[] input) { try (PDDocument pdDocument = loadPDF(input)) { PDTextField textField = (PDTextField) pdDocument.getDocumentCatalog().getAcroForm().getField(FIELD_NAME); assert textField != null; textField.setMultiline(true); ByteArrayOutputStream out = new ByteArrayOutputStream(); pdDocument.saveIncremental(out); return out.toByteArray(); } catch (IOException e) { throw new RuntimeException(e); } } private static PDDocument loadPDF(byte[] content) throws IOException { return org.apache.pdfbox.Loader.loadPDF(content); } private static void addFonts(PDAcroForm acroForm) throws IOException { PDFont font1 = new PDType1Font(Standard14Fonts.FontName.HELVETICA); PDFont font2 = new PDType1Font(Standard14Fonts.FontName.ZAPF_DINGBATS); PDResources resources = new PDResources(); resources.put(COSName.getPDFName("Helv"), font1); resources.put(COSName.getPDFName("ZaDb"), font2); acroForm.setDefaultResources(resources); } private static PDAcroForm addAcroForm(PDDocument pdDocument) throws IOException { PDAcroForm acroForm = new PDAcroForm(pdDocument); pdDocument.getDocumentCatalog().setAcroForm(acroForm); return acroForm; } private static PDPage addPage(PDDocument pdDocument) throws IOException { PDPage page = new PDPage(PDRectangle.A4); pdDocument.addPage(page); return page; } private static void addTextField(PDAcroForm acroForm, PDPage page) throws IOException { PDTextField textField = new PDTextField(acroForm); textField.setPartialName(FIELD_NAME); acroForm.getFields().add(textField); PDAnnotationWidget widget = textField.getWidgets().get(0); widget.setPage(page); page.getAnnotations().add(widget); PDRectangle rectangle = new PDRectangle(10, 200, 200, 15); widget.setRectangle(rectangle); } private static void printInfo(byte[] content, String name) { System.out.println("------------------------------------------------------"); System.out.println("-- " + name); try (PDDocument pdDocument = loadPDF(content)) { COSDocument cosDocument = pdDocument.getDocument(); System.out.println("getXrefTable.keySet.maxNumber: " + cosDocument.getXrefTable().keySet().stream() .mapToLong(COSObjectKey::getNumber) .max().getAsLong()); System.out.println("lastTrailer.size: " + cosDocument.getTrailer().getLong(COSName.SIZE)); } catch (IOException e) { throw new RuntimeException(e); } System.out.println("------------------------------------------------------"); } } {code} *Console Output* ------------------------------------------------------ -- input getXrefTable.keySet.maxNumber: 9lastTrailer.size: 10 ------------------------------------------------------** lastEntry is: 5 ------------------------------------------------------ -- output getXrefTable.keySet.maxNumber: 9lastTrailer.size: 6 ------------------------------------------------------ As the output shows: - The highest object number remains 9. - The final trailer size is incorrectly set to 6 after the incremental save, whereas it should remain 10. *Impact* - PDF editors that use the trailer’s size to determine the next free object reference can fail to handle the PDF correctly. - Some readers may reject or treat the PDF as corrupted if they strictly follow the specification about ignored objects. *Request* Could you please confirm whether this is indeed a regression? I would appreciate any workaround suggestions or information on whether a fix is planned in upcoming releases. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org