Mruthyunjaya S created PDFBOX-6163:
--------------------------------------

             Summary: Technical Issue Report: Intermittent Data Loss and Font 
Missing Errors during PDF Mergin
                 Key: PDFBOX-6163
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6163
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.33
         Environment: Production Environament. Windows server 2019
            Reporter: Mruthyunjaya S


h3. *Technical Issue Report: Intermittent Data Loss and Font Missing Errors 
during PDF Merging*

*Environment:*
 * *Library:* PDFBox 2.0.33

 * *Operating System:* Windows Server 2019

 * *Application Context:* Windchill MethodServer

 * *Affected Fonts:* Japanese fonts, specifically *MS Gothic* (Subsetted)

!BONwP8BDbZ1SWAE0xMAAAAASUVORK5CYII=!

*Issue Summary:* We are experiencing a high failure rate ({*}80% failure{*}) 
when merging technical CAD drawings into a single PDF package. Even though the 
source PDFs have fonts embedded, the generated "Merged PDF" frequently suffers 
from missing fonts or garbled text on specific pages.

*Key Observations:*
 * *Reproducibility:* In a test of 10 consecutive merge attempts, the output 
was correct only {*}2 times{*}, while the remaining *8 attempts* resulted in 
missing fonts.

 * *Specific Error:* The issue is almost exclusively linked to Japanese *MS 
Gothic* variants (e.g., {{{}ACWDKT+MS
              Gothic{}}}).

 * *Error Logs:* We frequently encounter {{Format 14 cmap table is not
              supported}} and {{Format 12 cmap contains an invalid
              glyph index}} warnings during the process.

 * *Client Behavior:* Adobe Acrobat fails to render the text, displaying the 
error: {_}"Cannot extract the embedded font 'ACWDKT+MS Gothic'. Some characters 
may not display or print correctly."{_}.

*Suspected Causes:*
 # *I/O Race Condition:* We intermittently receive {{{}java.io.IOException: 
Missing root
              object specification in trailer{}}}, suggesting the merger may be 
accessing files before they are fully flushed to disk or while they are still 
locked by the external converter.

 # *Resource Clashing:* We suspect the {{PDFMergerUtility}} may be clashing 
font resource aliases (like {{{}/F1{}}}) across different subsetted drawings, 
leading to corrupted Character Maps (CMaps) in the final document.

*Current Code Implementation:* We have attempted to fix this by implementing a 
{*}Targeted Healing Pass{*}. We re-open the merged PDF, scan for corrupted 
subsets, and re-embed a full English *Century Gothic* font using 
{{PDResources.put()}} and {{{}PDPageContentStream.AppendMode.APPEND{}}}. 
Despite this, the inconsistency persists.

!B9LkDnsB5VwVAAAAABJRU5ErkJggg==|width=691,height=127!

!TpFB0dHTVRXy1X8sSJE3qprF27tpZfhibqnp2djZOTE5aWlpqor1YrqfRpJd3lfwB8 
2EI21HCfQAAAABJRU5ErkJggg==!

!wNB1HRIeiHDFwAAAABJRU5ErkJggg==|width=808,height=565!

 

 

 
 # Is there a built-in mechanism in {{PDFMergerUtility}} to "flatten" or 
deduplicate subsetted fonts during the merge to prevent CMap clashing?

 # Given that Format 14/12 warnings are logged but don't throw exceptions, is 
there a recommended way to programmatically detect this "data loss" state 
before the file is saved?

 # Are there known issues with {{setupTempFileOnly()}} vs 
{{setupMainMemoryOnly()}} when dealing with large, complex vector drawings that 
might contribute to trailer parsing failures?

 

 

 

=======================Code For merge i had used=====================


private void mergeUsingPDFBox(List<String> pdfFiles, String outputFile) throws 
IOException {
        PDFMergerUtility merger = new PDFMergerUtility();
        merger.setDestinationFileName(outputFile);

        for (String file : pdfFiles) {
            merger.addSource(new File(file));
        }

        merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
    }

=====================================================

 

 

 

===================== I had tryed for the missing embed fonts to fix the 
reemmbed Bu it have the still issue.===============================

private void mergeUsingPDFBox(List<String> pdfFiles, String outputPath) throws 
IOException {
        org.apache.pdfbox.multipdf.PDFMergerUtility merger = new 
org.apache.pdfbox.multipdf.PDFMergerUtility();
        merger.setDestinationFileName(outputPath);

        System.out.println("\n[PHASE 1] Initial PDF Merging...");

        for (String filePath : pdfFiles) {
            File sourceFile = new File(filePath);
            
            // LOGIC: Prevent "Missing root object specification in trailer" 
error
            // This happens if we try to merge a file that is still 0-bytes 
(locked by converter)
            if (sourceFile.exists() && sourceFile.length() > 100) {
                merger.addSource(sourceFile);
                System.out.println("  --> Added to merge queue: " + 
sourceFile.getName() + " [" + sourceFile.length() + " bytes]");
            } else {
                System.out.println("  WARN: Skipping empty/invalid file (might 
be locked by conversion): " + filePath);
            }
        }

        // Execute merge using Main Memory to protect Windchill server heap
        
merger.mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting.setupMainMemoryOnly());
        System.out.println("[PHASE 2] Merge Saved to Disk. Starting CMap 
Corruption Scan...");

        // RE-OPEN the result to identify and heal Japanese encoding issues
        try (PDDocument mergedDoc = PDDocument.load(new File(outputPath))) {
            
            // LOGIC: Remove security. Modified content streams are blocked if 
owner password exists.
            if (mergedDoc.isEncrypted()) {
                System.out.println("DEBUG: Removing encryption to allow font 
re-embedding...");
                mergedDoc.setAllSecurityToBeRemoved(true);
            }

            // Load replacement font ONCE per document loop for memory 
efficiency
            String fontPath = getFontFileFor("GOTHIC");
            try (InputStream fontStream = new FileInputStream(new 
File(fontPath))) {
                
                // Load FULL font (no subsetting) to ensure all English 
character indices exist
                PDType0Font englishFont = PDType0Font.load(mergedDoc, 
fontStream, false);

                int repairCount = 0;
                for (int i = 0; i < mergedDoc.getNumberOfPages(); i++) {
                    PDPage page = mergedDoc.getPage(i);
                    PDResources res = page.getResources();
                    if (res == null) continue;

                    boolean pageHasError = false;
                    
                    // STEP A: Detect CMap Corruption (Format 12/14 warnings)
                    for (COSName fontAlias : res.getFontNames()) {
                        PDFont font = res.getFont(fontAlias);
                        
                        // Use our helper to force a check of the internal font 
mapping
                        if (isFontCorrupted(font)) {
                            System.out.println("  ALERT: Page " + (i + 1) + " 
contains corrupted Japanese CMap/Subsets. Repairing...");
                            pageHasError = true;
                            break; 
                        }
                    }

                    // STEP B: Heal the problematic page
                    if (pageHasError) {
                        for (COSName fontAlias : res.getFontNames()) {
                            String name = 
res.getFont(fontAlias).getName().toUpperCase();
                            
                            // LOGIC: Target Gothic subsets (+) that failed the 
validation check
                            if (name.contains("GOTHIC") || name.contains("+") 
|| name.contains("MS-")) {
                                res.put(fontAlias, englishFont);
                                repairCount++;
                            }
                        }
                        
                        // LOGIC: Re-render the operational stream
                        // Adding a space forces the PDF viewer to reload the 
character map using our new font
                        try (PDPageContentStream cs = new 
PDPageContentStream(mergedDoc, page, 
                                PDPageContentStream.AppendMode.APPEND, true, 
true)) {
                            cs.beginText();
                            cs.setFont(englishFont, 1);
                            cs.newLineAtOffset(0, 0);
                            cs.showText(" "); 
                            cs.endText();
                        }
                    }
                }
                System.out.println("INFO : Total font mappings repaired during 
final pass: " + repairCount);
            }

            // Final Save: Overwrite the merged file with the high-fidelity 
English version
            mergedDoc.save(outputPath);
            System.out.println("[PHASE 3] Final healing pass complete. Output 
verified.");

        } catch (Exception e) {
            System.err.println("CRITICAL ERROR: Failed to heal the merged PDF: 
" + e.getMessage());
            e.printStackTrace();
        }

        System.out.println(">>> SUCCESS! High-Fidelity PDF saved to: " + 
outputPath + "\n");
    }



=================================================================
-- 

*Logs Captured during Merge:* Our internal diagnostic tools show the following 
warnings from FontBox during the failing merges:
 * {{org.apache.fontbox.ttf.CmapSubtable:
                Format 14 cmap table is not supported and will be
                ignored}}

 * {{org.apache.fontbox.ttf.CmapSubtable:
                Format 12 cmap contains an invalid glyph index}}

*Questions for the Community:*
 # Why would the merger intermittently corrupt the CMap of a subsetted font 
that is already valid in the source document?

 # Is there a way to force {{PDFMergerUtility}} to *not* rename font subsets 
during merging, as we suspect alias clashing is causing the 80% failure rate?

 # Is there a more reliable way to "flatten" these Japanese fonts during the 
merge process to ensure 100% rendering success?

 \{*}How can we reslove this issue ,  please help us{*}.

 ** 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to