[
https://issues.apache.org/jira/browse/PDFBOX-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926338#comment-17926338
]
Tilman Hausherr commented on PDFBOX-5953:
-----------------------------------------
Here's what I wrote in the mailing list (not a solution, sadly)
===
I tried displaying your PDF file in PDFDebugger... page 1 is ok, but page 2 has
empty fields. This is because these fields have no appearance stream,
NeedAppearances is set, so the viewer has to do this. I switched on "repair
acroform" in PDFDebugger and things got terrible, now page 1 is wrong too.
!screenshot-1.png!
If I use the java calls like you did, all pages are wrong, including the first
one. The SimSun font isn't embedded but is on my computer (as a .ttc file).
Several other viewers display the file properly. Adobe asks for a filename when
closing which indicates it had to repair something.
Here's some code that fixes the rendering of this file:
{code:java}
try (PDDocument doc = Loader.loadPDF(new File("20251103 mail test2.pdf")))
{
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm(null); // avoids
any fixup
PDResources dr = acroForm.getDefaultResources();
PDFont font = PDType0Font.load(doc, new
FileInputStream("SimSun-UNSECURE.ttf"), false); // source
https://fontzone.net/font-details/simsun
dr.put(COSName.getPDFName("SimSun"), font);
acroForm.refreshAppearances();
PDFRenderer r = new PDFRenderer(doc);
ImageIO.write(r.renderImageWithDPI(0, 300), "png", new File("page1.png"));
ImageIO.write(r.renderImageWithDPI(6, 300), "png", new File("page7.png"));
}
{code}
The font I have on windows is a ttc file so I can't embed it. I downloaded a
simsun.ttf file from a dubious source and used that to replace the file in the
default resources.
However saving this file brings a file that is now incorrectly displayed in
Adobe Reader.
> Missing Fields in Table During PDF to Image Conversion
> ------------------------------------------------------
>
> Key: PDFBOX-5953
> URL: https://issues.apache.org/jira/browse/PDFBOX-5953
> Project: PDFBox
> Issue Type: Bug
> Components: AcroForm, Rendering
> Affects Versions: 2.0.33
> Environment: Windows10, JDK17
> Reporter: Vincent Lee
> Priority: Blocker
> Attachments: PdfFileTest.java, debugInfo.txt,
> image-2025-02-12-18-12-15-149.png, image-2025-02-12-18-12-24-291.png,
> screenshot-1.png, test.pdf, test_page1.png, test_page10.png, test_page2.png,
> test_page3.png, test_page4.png, test_page5.png, test_page6.png,
> test_page7.png, test_page8.png, test_page9.png
>
>
> The PDF displays correctly in the MS Edge browser. However, after converting
> it to an image using PDFBox, some fields begin to appear blank starting from
> the seventh image.
> Interestingly, the first six images are generated correctly. After comparing,
> I noticed that some fonts starting from the seventh page of the PDF differ
> from the ones used earlier.
> I suspect that missing fonts may be the cause of the issue, but since there
> are no errors or warnings in the debug information, I’m unsure which fonts
> are missing.
>
> !image-2025-02-12-18-12-15-149.png!
>
> !image-2025-02-12-18-12-24-291.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]