[ https://issues.apache.org/jira/browse/PDFBOX-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334716#comment-16334716 ]
ASF subversion and git services commented on PDFBOX-4076: --------------------------------------------------------- Commit 1821916 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1821916 ] PDFBOX-4076: don't replace characters outside of USASCII with "?", as suggested by Michael Klink > PDFBox cannot properly handle PDF Name objects containing bytes with values > outside the US_ASCII range > ------------------------------------------------------------------------------------------------------ > > Key: PDFBOX-4076 > URL: https://issues.apache.org/jira/browse/PDFBOX-4076 > Project: PDFBox > Issue Type: Bug > Affects Versions: 2.0.8 > Reporter: Tilman Hausherr > Priority: Major > > As reported by [~mkl] in his SO answer > {quote}The first error in PDF Name handling is that PDFBox internally > represents them as strings after a mixed UTF-8 / CP-1252 decoding strategy. > This is wrong, according to the PDF specification a name object is an atomic > symbol uniquely defined by a sequence of any characters (8-bit values) except > null (character code 0). > (...) > The second error is, though, that while serializing the PDF it only properly > encodes the characters in the strings representing names which are from > US_ASCII, all else are replaced by '?' > {quote} > sample code > {code:java} > PDDocument document = new PDDocument(); > PDPage page = new PDPage(); > document.addPage(page); > document.getDocumentCatalog().getCOSObject().setString(COSName.getPDFName("äöüß"), > "äöüß"); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > document.save(baos); > document.close(); > document = PDDocument.load(baos.toByteArray()); > System.out.println(document.getDocumentCatalog().getCOSObject().keySet()); > document.close(); > {code} > output: > {noformat} > [COSName{Type}, COSName{Version}, COSName{Pages}, COSName{????}] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org