Antonio Pozo created PDFBOX-6050:
------------------------------------

             Summary: Japanese text not rendered when converting editable PDFs 
to images
                 Key: PDFBOX-6050
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6050
             Project: PDFBox
          Issue Type: Bug
            Reporter: Antonio Pozo


We are using the PDFBox library to convert PDF documents to images. We have 
found that, for many _editable_ PDFs containing Japanese text (PDF forms), the 
generated images do not display the Japanese characters (it seems PDFBox is 
unable to extract/render them). Text in other languages within the same 
document is rendered correctly.

This issue does *not* occur with non-editable PDFs — in those cases, Japanese 
text is rendered without problems.

We have observed that if we open these editable PDFs in a PDF editor and simply 
save them again (without making any changes), PDFBox is then able to generate 
the images with the Japanese text rendered correctly. We have also tried 
setting a [font that supports the Japanese 
alphabe|https://fonts.google.com/noto/specimen/Noto+Sans+JP]t in the library 
before converting the PDF to an image, without success.

Other tools (such as Adobe Reader) are able to display and extract the Japanese 
text from these editable PDFs without requiring the open-and-save step.

*Expected behavior:*
When converting editable PDFs containing Japanese text to images, the Japanese 
characters should be rendered correctly without requiring the document to be 
re-saved.

*Actual behavior:*
When converting editable PDFs containing Japanese text to images, the Japanese 
characters are missing in the generated images, while text in other languages 
is rendered correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to