[ 
https://issues.apache.org/jira/browse/PDFBOX-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807141#comment-17807141
 ] 

Tilman Hausherr commented on PDFBOX-5752:
-----------------------------------------

Here's some modified code based on yours to reproduce it:
{code:java}
    public static void main(String[] args) throws IOException
    {
        File dir = new File("XXXXXXXXXX");
        try (PDDocument targetDoc = Loader.loadPDF(new File(dir, "empty.pdf"));
             PDDocument doc2 = Loader.loadPDF(new File(dir, "roboto-14.pdf")))
        {
            PDPage sourcePage = doc2.getPage(0);
            final var copiedPage = targetDoc.importPage(sourcePage);
            copiedPage.setResources(sourcePage.getResources());
            targetDoc.save(new File(dir, "target.pdf"));
            
            PDResources res = targetDoc.getPage(1).getResources();
            for (COSName name : res.getFontNames())
            {
                PDFont font = res.getFont(name);
                System.out.println(name.getName() + " " + font.getName());
            }
        }
        try (PDDocument targetDoc = Loader.loadPDF(new File(dir, "target.pdf")))
        {
            
System.out.println(targetDoc.getDocumentCatalog().getStructureTreeRoot().getCOSObject());
            
            PDResources res = targetDoc.getPage(1).getResources();
            for (COSName name : res.getFontNames())
            {
                PDFont font = res.getFont(name);
                System.out.println(name.getName() + " " + font.getName());
            }
        }
    }
{code}

output (windows related warnings removed):

F1 BCDEEE+Roboto-Regular
F2 BCDFEE+Roboto-Regular
COSDictionary{COSName{Type}:COSName{ExtGState};COSName{BM}:COSName{Normal};COSName{ca}:COSInt{1};}
F1 BCDEEE+Calibri
F2 BCDFEE+Roboto-Regular


> font errors after copying a page to another document
> ----------------------------------------------------
>
>                 Key: PDFBOX-5752
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5752
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.1 PDFBox
>            Reporter: Christian Haegele
>            Priority: Major
>         Attachments: empty.pdf, image-2024-01-16-07-41-16-462.png, 
> image-2024-01-16-07-46-04-195.png, image-2024-01-16-07-47-05-883.png, 
> roboto-14.pdf, target-merged882552058302116763.pdf
>
>
> I try to merge import a page into a pdf document and copy the font resources. 
> With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a 
> result document in including the required, embedded fonts. 
> Essentially I'm doing this steps in the code, while the first document is one 
> empty page PDF/A, and the second document contains the roboto font PDF/A. All 
> fonts are embedded.
>  
> {code:java}
> PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
> PDPage sourcePage = Loader.loadPDF(data).getPage(0);
> final var copiedPage = targetDoc.importPage(sourcePage);
> copiedPage.setResources(sourcePage.getResources());{code}
> In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if 
> you open it in the Adobe Acrobat. 
> It shows a lot of errors, if you open it with the PDFBOX PreflightParser.
> Here the error messages of the preflight parser:
> {{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is 
> missing from FontDescriptor}}
> {{3.1.14 Invalid Font definition, Unknown font type: XML}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.1.8 Invalid Font definition}}
> {{3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some 
> mandatory fields are missing from the FontDescriptor: Type, ItalicAngle, 
> FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.}}
> {{3.1.3 Invalid Font definition, null: FontFile entry is missing from 
> FontDescriptor}}
> {{3.3.2 Glyph error, invalid font dictionary ==> }}
> and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot 
> version from 15.01.2024.
>  
> {code:java}
>    @Test
>     void importPageWithFonts_validateFontInfo() throws IOException {
>         // given
>         final var targetDocBytes = 
> IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
>         String[] additionalFiles = new String[]{
>             "roboto-14.pdf",
>         };
>         PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
>         // when
>         for (String fileName : Arrays.asList(additionalFiles)) {
>             byte[] data = 
> IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
>             // verify source is valid
>             PDPage sourcePage = Loader.loadPDF(data).getPage(0);
>             final var copiedPage = targetDoc.importPage(sourcePage);
>             copiedPage.setResources(sourcePage.getResources());
>             targetDoc.save(Files.createTempFile("merged-fonts", 
> ".pdf").toFile());
>         }
>         Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
>         targetDoc.save(tmpFile.toFile(), 
> CompressParameters.DEFAULT_COMPRESSION);
>         // then
>         // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: 
> FontFile entry is missing from FontDescriptor
>         assertFontsAreValid(tmpFile);
>     }
>     private static void assertFontsAreValid(Path tmpFile) throws IOException {
>         PreflightParser parser = new PreflightParser(tmpFile.toFile());
>         final var documentToVerify = (PreflightDocument) parser.parse();
>         // Get validation result
>         final var result = documentToVerify.validate();
>         final var resultString = result.getErrorsList().stream()
>             .filter(err -> !err.getErrorCode()
>                 
> .matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter 
> findings from the source documents
>             .map(err -> err.getErrorCode() + " " + 
> err.getDetails()).collect(Collectors.joining("\n"));
>         assertTrue(resultString.isBlank(), resultString);
>     }
> {code}
>  
> The problem is still present with the snapshot version 
> 3.0.2-2024-0115.083906-63.
>  
> Here is the output preflight parser output of the snapshot version:
> {{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program 
> "BCDEEE+Calibri" is missing from the Character Encoding}}
>  
> The input displays correctly:
> !image-2024-01-16-07-47-05-883.png!
> The output file doesn't display the font correctly:
> !image-2024-01-16-07-46-04-195.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to