Antonio Pozo created PDFBOX-6050: ------------------------------------ Summary: Japanese text not rendered when converting editable PDFs to images Key: PDFBOX-6050 URL: https://issues.apache.org/jira/browse/PDFBOX-6050 Project: PDFBox Issue Type: Bug Reporter: Antonio Pozo
We are using the PDFBox library to convert PDF documents to images. We have found that, for many _editable_ PDFs containing Japanese text (PDF forms), the generated images do not display the Japanese characters (it seems PDFBox is unable to extract/render them). Text in other languages within the same document is rendered correctly. This issue does *not* occur with non-editable PDFs — in those cases, Japanese text is rendered without problems. We have observed that if we open these editable PDFs in a PDF editor and simply save them again (without making any changes), PDFBox is then able to generate the images with the Japanese text rendered correctly. We have also tried setting a [font that supports the Japanese alphabe|https://fonts.google.com/noto/specimen/Noto+Sans+JP]t in the library before converting the PDF to an image, without success. Other tools (such as Adobe Reader) are able to display and extract the Japanese text from these editable PDFs without requiring the open-and-save step. *Expected behavior:* When converting editable PDFs containing Japanese text to images, the Japanese characters should be rendered correctly without requiring the document to be re-saved. *Actual behavior:* When converting editable PDFs containing Japanese text to images, the Japanese characters are missing in the generated images, while text in other languages is rendered correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org