The bug is fixed in ghostscript (I have not checked): http://bugs.ghostscript.com/show_bug.cgi?id=693477 says:
Technically the 'correct' approach is to define a PDFDSCEncoding which maps the non-ASCII values. However, this is non-trivial, and counter-intuitive. I've made changes so that in the absence of a PDFDSCEncoding we will assume that any non UTF-16BE string is using PDFDocEncoding. We then convert that to UTF-16BE and on to UTF-8. This should resolve the problem. See commit: a3d00daf5f9abb1209cb750a95e23bc6951c1c63 Commit log: pdfwrite - convert non-UTF-16BE doc info to UTF-8 assuming PDFDocEncoding Bug #693477 "Encoding of pdf metadata do not comply with pdf standard" When processing Document info there is a pdfwrite parameter 'PDFDSCEncoding' which, if present, is used to process the string into ASCII. However, if this parameter is not supplied, we don't re-encode the string at all. Since the XML must be UTF-8, this is potentially a problem. Since we cannot know the source of the docinfo string (existing PDF, DOCINFO pdfmark, or DSC comments in PostScript) we cannot make any judgement about the encoding of the string data in the absence of PDFDSCENcoding. So we choose to assume that its encoded using PDFDocEncoding if it does not have a UTF-16BE BOM (which is the only other format permitted). This should at least mean that the Docinfo and XML match and are legal. No differences expected, the cluster doesn't check the XML _______________________________________________ evince-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/evince-list
