[
https://issues.apache.org/jira/browse/PDFBOX-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian Haegele updated PDFBOX-5752:
--------------------------------------
Description:
I try to merge import a page into a pdf document and copy the font resources.
With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a result
document, including the required, embedded fonts.
Essentially I'm doing this steps in the code, while the first document is one
empty page PDF/A, and the second document contains the roboto font, also a
PDF/A document. All fonts are embedded.
{code:java}
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());{code}
In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if
you open it in the Adobe Acrobat.
It shows a lot of errors, if you open it with the PDFBOX PreflightParser.
Here the error messages of the preflight parser:
{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is
missing from FontDescriptor}}
{{3.1.14 Invalid Font definition, Unknown font type: XML}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.1.8 Invalid Font definition}}
{{3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory
fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox,
Ascent, FontName, StemV, Flags, CapHeight, Descent.}}
{{3.1.3 Invalid Font definition, null: FontFile entry is missing from
FontDescriptor}}
{{3.3.2 Glyph error, invalid font dictionary ==> }}
and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot
version from 15.01.2024.
{code:java}
@Test
void importPageWithFonts_validateFontInfo() throws IOException {
// given
final var targetDocBytes =
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
String[] additionalFiles = new String[]{
"roboto-14.pdf",
};
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
// when
for (String fileName : Arrays.asList(additionalFiles)) {
byte[] data =
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
// verify source is valid
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());
targetDoc.save(Files.createTempFile("merged-fonts",
".pdf").toFile());
}
Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
targetDoc.save(tmpFile.toFile(),
CompressParameters.DEFAULT_COMPRESSION);
// then
// font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular:
FontFile entry is missing from FontDescriptor
assertFontsAreValid(tmpFile);
}
private static void assertFontsAreValid(Path tmpFile) throws IOException {
PreflightParser parser = new PreflightParser(tmpFile.toFile());
final var documentToVerify = (PreflightDocument) parser.parse();
// Get validation result
final var result = documentToVerify.validate();
final var resultString = result.getErrorsList().stream()
.filter(err -> !err.getErrorCode()
.matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter
findings from the source documents
.map(err -> err.getErrorCode() + " " +
err.getDetails()).collect(Collectors.joining("\n"));
assertTrue(resultString.isBlank(), resultString);
}
{code}
The problem is still present with the snapshot version
3.0.2-2024-0115.083906-63.
Here is the output preflight parser output of the snapshot version:
{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
The input displays correctly:
!image-2024-01-16-07-47-05-883.png!
The output file doesn't display the font correctly:
!image-2024-01-16-07-46-04-195.png!
was:
I try to merge import a page into a pdf document and copy the font resources.
With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a result
document in including the required, embedded fonts.
Essentially I'm doing this steps in the code, while the first document is one
empty page PDF/A, and the second document contains the roboto font PDF/A. All
fonts are embedded.
{code:java}
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());{code}
In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if
you open it in the Adobe Acrobat.
It shows a lot of errors, if you open it with the PDFBOX PreflightParser.
Here the error messages of the preflight parser:
{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is
missing from FontDescriptor}}
{{3.1.14 Invalid Font definition, Unknown font type: XML}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.1.8 Invalid Font definition}}
{{3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory
fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox,
Ascent, FontName, StemV, Flags, CapHeight, Descent.}}
{{3.1.3 Invalid Font definition, null: FontFile entry is missing from
FontDescriptor}}
{{3.3.2 Glyph error, invalid font dictionary ==> }}
and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot
version from 15.01.2024.
{code:java}
@Test
void importPageWithFonts_validateFontInfo() throws IOException {
// given
final var targetDocBytes =
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
String[] additionalFiles = new String[]{
"roboto-14.pdf",
};
PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
// when
for (String fileName : Arrays.asList(additionalFiles)) {
byte[] data =
IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
// verify source is valid
PDPage sourcePage = Loader.loadPDF(data).getPage(0);
final var copiedPage = targetDoc.importPage(sourcePage);
copiedPage.setResources(sourcePage.getResources());
targetDoc.save(Files.createTempFile("merged-fonts",
".pdf").toFile());
}
Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
targetDoc.save(tmpFile.toFile(),
CompressParameters.DEFAULT_COMPRESSION);
// then
// font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular:
FontFile entry is missing from FontDescriptor
assertFontsAreValid(tmpFile);
}
private static void assertFontsAreValid(Path tmpFile) throws IOException {
PreflightParser parser = new PreflightParser(tmpFile.toFile());
final var documentToVerify = (PreflightDocument) parser.parse();
// Get validation result
final var result = documentToVerify.validate();
final var resultString = result.getErrorsList().stream()
.filter(err -> !err.getErrorCode()
.matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter
findings from the source documents
.map(err -> err.getErrorCode() + " " +
err.getDetails()).collect(Collectors.joining("\n"));
assertTrue(resultString.isBlank(), resultString);
}
{code}
The problem is still present with the snapshot version
3.0.2-2024-0115.083906-63.
Here is the output preflight parser output of the snapshot version:
{{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
{{3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri"
is missing from the Character Encoding}}
The input displays correctly:
!image-2024-01-16-07-47-05-883.png!
The output file doesn't display the font correctly:
!image-2024-01-16-07-46-04-195.png!
> Font errors after copying a page to another document
> ----------------------------------------------------
>
> Key: PDFBOX-5752
> URL: https://issues.apache.org/jira/browse/PDFBOX-5752
> Project: PDFBox
> Issue Type: Bug
> Components: Writing
> Affects Versions: 3.0.1 PDFBox
> Reporter: Christian Haegele
> Priority: Critical
> Attachments: empty.pdf, image-2024-01-16-07-41-16-462.png,
> image-2024-01-16-07-46-04-195.png, image-2024-01-16-07-47-05-883.png,
> roboto-14.pdf, target-merged882552058302116763.pdf
>
>
> I try to merge import a page into a pdf document and copy the font resources.
> With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a
> result document, including the required, embedded fonts.
> Essentially I'm doing this steps in the code, while the first document is one
> empty page PDF/A, and the second document contains the roboto font, also a
> PDF/A document. All fonts are embedded.
>
> {code:java}
> PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
> PDPage sourcePage = Loader.loadPDF(data).getPage(0);
> final var copiedPage = targetDoc.importPage(sourcePage);
> copiedPage.setResources(sourcePage.getResources());{code}
> In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if
> you open it in the Adobe Acrobat.
> It shows a lot of errors, if you open it with the PDFBOX PreflightParser.
> Here the error messages of the preflight parser:
> {{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is
> missing from FontDescriptor}}
> {{3.1.14 Invalid Font definition, Unknown font type: XML}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.1.8 Invalid Font definition}}
> {{3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some
> mandatory fields are missing from the FontDescriptor: Type, ItalicAngle,
> FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.}}
> {{3.1.3 Invalid Font definition, null: FontFile entry is missing from
> FontDescriptor}}
> {{3.3.2 Glyph error, invalid font dictionary ==> }}
> and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot
> version from 15.01.2024.
>
> {code:java}
> @Test
> void importPageWithFonts_validateFontInfo() throws IOException {
> // given
> final var targetDocBytes =
> IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
> String[] additionalFiles = new String[]{
> "roboto-14.pdf",
> };
> PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
> // when
> for (String fileName : Arrays.asList(additionalFiles)) {
> byte[] data =
> IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
> // verify source is valid
> PDPage sourcePage = Loader.loadPDF(data).getPage(0);
> final var copiedPage = targetDoc.importPage(sourcePage);
> copiedPage.setResources(sourcePage.getResources());
> targetDoc.save(Files.createTempFile("merged-fonts",
> ".pdf").toFile());
> }
> Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
> targetDoc.save(tmpFile.toFile(),
> CompressParameters.DEFAULT_COMPRESSION);
> // then
> // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular:
> FontFile entry is missing from FontDescriptor
> assertFontsAreValid(tmpFile);
> }
> private static void assertFontsAreValid(Path tmpFile) throws IOException {
> PreflightParser parser = new PreflightParser(tmpFile.toFile());
> final var documentToVerify = (PreflightDocument) parser.parse();
> // Get validation result
> final var result = documentToVerify.validate();
> final var resultString = result.getErrorsList().stream()
> .filter(err -> !err.getErrorCode()
>
> .matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter
> findings from the source documents
> .map(err -> err.getErrorCode() + " " +
> err.getDetails()).collect(Collectors.joining("\n"));
> assertTrue(resultString.isBlank(), resultString);
> }
> {code}
>
> The problem is still present with the snapshot version
> 3.0.2-2024-0115.083906-63.
>
> Here is the output preflight parser output of the snapshot version:
> {{1.4 Trailer Syntax error, /XRef cross reference streams are not allowed}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
> {{3.3.1 Glyph error, The character code 0 in the font program
> "BCDEEE+Calibri" is missing from the Character Encoding}}
>
> The input displays correctly:
> !image-2024-01-16-07-47-05-883.png!
> The output file doesn't display the font correctly:
> !image-2024-01-16-07-46-04-195.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]