Mruthyunjaya S created PDFBOX-6163:
--------------------------------------
Summary: Technical Issue Report: Intermittent Data Loss and Font
Missing Errors during PDF Mergin
Key: PDFBOX-6163
URL: https://issues.apache.org/jira/browse/PDFBOX-6163
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 2.0.33
Environment: Production Environament. Windows server 2019
Reporter: Mruthyunjaya S
h3. *Technical Issue Report: Intermittent Data Loss and Font Missing Errors
during PDF Merging*
*Environment:*
* *Library:* PDFBox 2.0.33
* *Operating System:* Windows Server 2019
* *Application Context:* Windchill MethodServer
* *Affected Fonts:* Japanese fonts, specifically *MS Gothic* (Subsetted)
!BONwP8BDbZ1SWAE0xMAAAAASUVORK5CYII=!
*Issue Summary:* We are experiencing a high failure rate ({*}80% failure{*})
when merging technical CAD drawings into a single PDF package. Even though the
source PDFs have fonts embedded, the generated "Merged PDF" frequently suffers
from missing fonts or garbled text on specific pages.
*Key Observations:*
* *Reproducibility:* In a test of 10 consecutive merge attempts, the output
was correct only {*}2 times{*}, while the remaining *8 attempts* resulted in
missing fonts.
* *Specific Error:* The issue is almost exclusively linked to Japanese *MS
Gothic* variants (e.g., {{{}ACWDKT+MS
Gothic{}}}).
* *Error Logs:* We frequently encounter {{Format 14 cmap table is not
supported}} and {{Format 12 cmap contains an invalid
glyph index}} warnings during the process.
* *Client Behavior:* Adobe Acrobat fails to render the text, displaying the
error: {_}"Cannot extract the embedded font 'ACWDKT+MS Gothic'. Some characters
may not display or print correctly."{_}.
*Suspected Causes:*
# *I/O Race Condition:* We intermittently receive {{{}java.io.IOException:
Missing root
object specification in trailer{}}}, suggesting the merger may be
accessing files before they are fully flushed to disk or while they are still
locked by the external converter.
# *Resource Clashing:* We suspect the {{PDFMergerUtility}} may be clashing
font resource aliases (like {{{}/F1{}}}) across different subsetted drawings,
leading to corrupted Character Maps (CMaps) in the final document.
*Current Code Implementation:* We have attempted to fix this by implementing a
{*}Targeted Healing Pass{*}. We re-open the merged PDF, scan for corrupted
subsets, and re-embed a full English *Century Gothic* font using
{{PDResources.put()}} and {{{}PDPageContentStream.AppendMode.APPEND{}}}.
Despite this, the inconsistency persists.
!B9LkDnsB5VwVAAAAABJRU5ErkJggg==|width=691,height=127!
!TpFB0dHTVRXy1X8sSJE3qprF27tpZfhibqnp2djZOTE5aWlpqor1YrqfRpJd3lfwB8
2EI21HCfQAAAABJRU5ErkJggg==!
!wNB1HRIeiHDFwAAAABJRU5ErkJggg==|width=808,height=565!
# Is there a built-in mechanism in {{PDFMergerUtility}} to "flatten" or
deduplicate subsetted fonts during the merge to prevent CMap clashing?
# Given that Format 14/12 warnings are logged but don't throw exceptions, is
there a recommended way to programmatically detect this "data loss" state
before the file is saved?
# Are there known issues with {{setupTempFileOnly()}} vs
{{setupMainMemoryOnly()}} when dealing with large, complex vector drawings that
might contribute to trailer parsing failures?
=======================Code For merge i had used=====================
private void mergeUsingPDFBox(List<String> pdfFiles, String outputFile) throws
IOException {
PDFMergerUtility merger = new PDFMergerUtility();
merger.setDestinationFileName(outputFile);
for (String file : pdfFiles) {
merger.addSource(new File(file));
}
merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
}
=====================================================
===================== I had tryed for the missing embed fonts to fix the
reemmbed Bu it have the still issue.===============================
private void mergeUsingPDFBox(List<String> pdfFiles, String outputPath) throws
IOException {
org.apache.pdfbox.multipdf.PDFMergerUtility merger = new
org.apache.pdfbox.multipdf.PDFMergerUtility();
merger.setDestinationFileName(outputPath);
System.out.println("\n[PHASE 1] Initial PDF Merging...");
for (String filePath : pdfFiles) {
File sourceFile = new File(filePath);
// LOGIC: Prevent "Missing root object specification in trailer"
error
// This happens if we try to merge a file that is still 0-bytes
(locked by converter)
if (sourceFile.exists() && sourceFile.length() > 100) {
merger.addSource(sourceFile);
System.out.println(" --> Added to merge queue: " +
sourceFile.getName() + " [" + sourceFile.length() + " bytes]");
} else {
System.out.println(" WARN: Skipping empty/invalid file (might
be locked by conversion): " + filePath);
}
}
// Execute merge using Main Memory to protect Windchill server heap
merger.mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting.setupMainMemoryOnly());
System.out.println("[PHASE 2] Merge Saved to Disk. Starting CMap
Corruption Scan...");
// RE-OPEN the result to identify and heal Japanese encoding issues
try (PDDocument mergedDoc = PDDocument.load(new File(outputPath))) {
// LOGIC: Remove security. Modified content streams are blocked if
owner password exists.
if (mergedDoc.isEncrypted()) {
System.out.println("DEBUG: Removing encryption to allow font
re-embedding...");
mergedDoc.setAllSecurityToBeRemoved(true);
}
// Load replacement font ONCE per document loop for memory
efficiency
String fontPath = getFontFileFor("GOTHIC");
try (InputStream fontStream = new FileInputStream(new
File(fontPath))) {
// Load FULL font (no subsetting) to ensure all English
character indices exist
PDType0Font englishFont = PDType0Font.load(mergedDoc,
fontStream, false);
int repairCount = 0;
for (int i = 0; i < mergedDoc.getNumberOfPages(); i++) {
PDPage page = mergedDoc.getPage(i);
PDResources res = page.getResources();
if (res == null) continue;
boolean pageHasError = false;
// STEP A: Detect CMap Corruption (Format 12/14 warnings)
for (COSName fontAlias : res.getFontNames()) {
PDFont font = res.getFont(fontAlias);
// Use our helper to force a check of the internal font
mapping
if (isFontCorrupted(font)) {
System.out.println(" ALERT: Page " + (i + 1) + "
contains corrupted Japanese CMap/Subsets. Repairing...");
pageHasError = true;
break;
}
}
// STEP B: Heal the problematic page
if (pageHasError) {
for (COSName fontAlias : res.getFontNames()) {
String name =
res.getFont(fontAlias).getName().toUpperCase();
// LOGIC: Target Gothic subsets (+) that failed the
validation check
if (name.contains("GOTHIC") || name.contains("+")
|| name.contains("MS-")) {
res.put(fontAlias, englishFont);
repairCount++;
}
}
// LOGIC: Re-render the operational stream
// Adding a space forces the PDF viewer to reload the
character map using our new font
try (PDPageContentStream cs = new
PDPageContentStream(mergedDoc, page,
PDPageContentStream.AppendMode.APPEND, true,
true)) {
cs.beginText();
cs.setFont(englishFont, 1);
cs.newLineAtOffset(0, 0);
cs.showText(" ");
cs.endText();
}
}
}
System.out.println("INFO : Total font mappings repaired during
final pass: " + repairCount);
}
// Final Save: Overwrite the merged file with the high-fidelity
English version
mergedDoc.save(outputPath);
System.out.println("[PHASE 3] Final healing pass complete. Output
verified.");
} catch (Exception e) {
System.err.println("CRITICAL ERROR: Failed to heal the merged PDF:
" + e.getMessage());
e.printStackTrace();
}
System.out.println(">>> SUCCESS! High-Fidelity PDF saved to: " +
outputPath + "\n");
}
=================================================================
--
*Logs Captured during Merge:* Our internal diagnostic tools show the following
warnings from FontBox during the failing merges:
* {{org.apache.fontbox.ttf.CmapSubtable:
Format 14 cmap table is not supported and will be
ignored}}
* {{org.apache.fontbox.ttf.CmapSubtable:
Format 12 cmap contains an invalid glyph index}}
*Questions for the Community:*
# Why would the merger intermittently corrupt the CMap of a subsetted font
that is already valid in the source document?
# Is there a way to force {{PDFMergerUtility}} to *not* rename font subsets
during merging, as we suspect alias clashing is causing the 80% failure rate?
# Is there a more reliable way to "flatten" these Japanese fonts during the
merge process to ensure 100% rendering success?
\{*}How can we reslove this issue , please help us{*}.
**
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]