CHristian Haegele created PDFBOX-5752:
-----------------------------------------

             Summary: font errors after copying a page to another document
                 Key: PDFBOX-5752
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5752
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 3.0.1 PDFBox
            Reporter: CHristian Haegele
         Attachments: empty.pdf, image-2024-01-16-07-41-16-462.png, 
image-2024-01-16-07-46-04-195.png, image-2024-01-16-07-47-05-883.png, 
roboto-14.pdf, target-merged882552058302116763.pdf

I try to merge import a page into a pdf document and copy the font resources. 
With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a result 
document in including the required, embedded fonts. 

Essentially I'm doing this steps in the code, while the first document is one 
empty page PDF/A, and the second document contains the roboto font PDF/A. All 
fonts are embedded.

 
{code:java}
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());{code}

In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if 
you open it in the Adobe Acrobat. 

It shows a lot of errors, if you open it with the PDFBOX PreflightParser.

Here the error messages of the preflight parser:

{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is 
missing from FontDescriptor}}
{{3.1.14 Invalid Font definition, Unknown font type: XML}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.1.8 Invalid Font definition}}
{{3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory 
fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox, 
Ascent, FontName, StemV, Flags, CapHeight, Descent.}}
{{3.1.3 Invalid Font definition, null: FontFile entry is missing from 
FontDescriptor}}
{{3.3.2 Glyph error, invalid font dictionary ==> }}




and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot 
version from 15.01.2024.

 
{code:java}
   @Test
    void importPageWithFonts_validateFontInfo() throws IOException {
        // given
        final var targetDocBytes = 
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
        String[] additionalFiles = new String[]{
            "roboto-14.pdf",
        };
        PDDocument targetDoc = Loader.loadPDF(targetDocBytes);

        // when
        for (String fileName : Arrays.asList(additionalFiles)) {
            byte[] data = 
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
            // verify source is valid
            PDPage sourcePage = Loader.loadPDF(data).getPage(0);
            final var copiedPage = targetDoc.importPage(sourcePage);
            copiedPage.setResources(sourcePage.getResources());
            targetDoc.save(Files.createTempFile("merged-fonts", 
".pdf").toFile());
        }
        Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
        targetDoc.save(tmpFile.toFile(), 
CompressParameters.DEFAULT_COMPRESSION);

        // then
        // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: 
FontFile entry is missing from FontDescriptor
        assertFontsAreValid(tmpFile);
    }
    private static void assertFontsAreValid(Path tmpFile) throws IOException {
        PreflightParser parser = new PreflightParser(tmpFile.toFile());
        final var documentToVerify = (PreflightDocument) parser.parse();
        // Get validation result
        final var result = documentToVerify.validate();
        final var resultString = result.getErrorsList().stream()
            .filter(err -> !err.getErrorCode()
                
.matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter 
findings from the source documents
            .map(err -> err.getErrorCode() + " " + 
err.getDetails()).collect(Collectors.joining("\n"));
        assertTrue(resultString.isBlank(), resultString);
    }
{code}
 

The problem is still present with the snapshot version 
3.0.2-2024-0115.083906-63.

 

Here is the output preflight parser output of the snapshot version:

{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" 
is missing from the Character Encoding}}

 

The input displays correctly:

!image-2024-01-16-07-47-05-883.png!

The output file doesn't display the font correctly:

!image-2024-01-16-07-46-04-195.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to