Hello, I am building a webapp that receiving text from rich text (tinyMCE) componnent, planning to build a pdf from this text (hard copy-paste from Word doc is allow) using PdfBox.
As my server is runing on Debian, every thing is encoded/decoded in UTF-8 (server, jvm and database storing the text), nevertheless (as you see my poor english), i am french, and then text too. At end, I would like to proceed any language. First unescaping xml rich text with StringEscapeUtils from apache, the result is that characters like rightquote cannot be correctly rendered on PDFs. Re-encoding text in UTF-16 was appearing for me to be a solution but i am facing this bug : https://issues.apache.org/jira/browse/PDFBOX-1242 As nobody seems to correct this bug, is any work around i could use? Thanks, Kévin