[jira] [Commented] (PDFBOX-5953) Missing Fields in Table During PDF to Image Conversion

Tilman Hausherr (Jira) Wed, 12 Feb 2025 03:00:54 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926338#comment-17926338
 ]


Tilman Hausherr commented on PDFBOX-5953:
-----------------------------------------

Here's what I wrote in the mailing list (not a solution, sadly)
===
I tried displaying your PDF file in PDFDebugger... page 1 is ok, but page 2 has 
empty fields. This is because these fields have no appearance stream, 
NeedAppearances is set, so the viewer has to do this. I switched on "repair 
acroform" in PDFDebugger and things got terrible, now page 1 is wrong too.
 !screenshot-1.png! 

If I use the java calls like you did, all pages are wrong, including the first 
one. The SimSun font isn't embedded but is on my computer (as a .ttc file).

Several other viewers display the file properly. Adobe asks for a filename when 
closing which indicates it had to repair something.

Here's some code that fixes the rendering of this file:
{code:java}
try (PDDocument doc = Loader.loadPDF(new File("20251103 mail test2.pdf")))
{
    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(null); // avoids 
any fixup
    PDResources dr = acroForm.getDefaultResources();
    PDFont font = PDType0Font.load(doc, new 
FileInputStream("SimSun-UNSECURE.ttf"), false); // source 
https://fontzone.net/font-details/simsun
    dr.put(COSName.getPDFName("SimSun"), font);
    acroForm.refreshAppearances();
    PDFRenderer r = new PDFRenderer(doc);
    ImageIO.write(r.renderImageWithDPI(0, 300), "png", new File("page1.png"));
    ImageIO.write(r.renderImageWithDPI(6, 300), "png", new File("page7.png"));
}
{code}
The font I have on windows is a ttc file so I can't embed it. I downloaded a 
simsun.ttf file from a dubious source and used that to replace the file in the 
default resources.

However saving this file brings a file that is now incorrectly displayed in 
Adobe Reader. 

> Missing Fields in Table During PDF to Image Conversion
> ------------------------------------------------------
>
>                 Key: PDFBOX-5953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5953
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, Rendering
>    Affects Versions: 2.0.33
>         Environment: Windows10, JDK17
>            Reporter: Vincent Lee
>            Priority: Blocker
>         Attachments: PdfFileTest.java, debugInfo.txt, 
> image-2025-02-12-18-12-15-149.png, image-2025-02-12-18-12-24-291.png, 
> screenshot-1.png, test.pdf, test_page1.png, test_page10.png, test_page2.png, 
> test_page3.png, test_page4.png, test_page5.png, test_page6.png, 
> test_page7.png, test_page8.png, test_page9.png
>
>
> The PDF displays correctly in the MS Edge browser. However, after converting 
> it to an image using PDFBox, some fields begin to appear blank starting from 
> the seventh image.
> Interestingly, the first six images are generated correctly. After comparing, 
> I noticed that some fonts starting from the seventh page of the PDF differ 
> from the ones used earlier.
> I suspect that missing fonts may be the cause of the issue, but since there 
> are no errors or warnings in the debug information, I’m unsure which fonts 
> are missing.
>  
> !image-2025-02-12-18-12-15-149.png!
>  
> !image-2025-02-12-18-12-24-291.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5953) Missing Fields in Table During PDF to Image Conversion

Reply via email to