"[email protected]" <[email protected]> writes: > Hi, > > Thanks for reporting this. This error is on the parse of the metadata. > I have no time right now to look in deep at it, will try to do next > week, but the description you give is wrong to my eyes, so another > thing must be happening. I'll try to explain. One thing is that the > character "ä" is U+00e4, and another thing is how to code this > character in UTF-8, where you need two bytes, and the code is c3 a4, > so if lilypond are trying to code "ä" as a e4, this is not a valid > UTF-8 code!
Sure, it isn't. But pdfmarks are not encoded in UTF-8. They are encoded either in PDFDocEncoding (a subset of Latin-1) or in UTF16BE with byte order mark. Complain to Adobe about their choice, but as long as that is the way PDF encodes stuff, Evince can't unilaterally decide for something saner. > Please note that the code that throws the error is the libxml parser, > which usually is very strict about encodings and things like that. The respective part in the PDF looks like <</Producer(GPL Ghostscript 9.06) /CreationDate(D:20121128183026+01'00') /ModDate(D:20121128183026+01'00') /Creator(LilyPond 2.17.7) /Author(\344 \366) /Title(\376\377\003\262) /Composer(\344 \366)>>endobj As you can see, there is no XML involved here at all. Note that the PDF in the original report was generated from an input file accidentally written in Latin-1 (LilyPond requires UTF-8 input), so all bets are off with that. However, when correctly encoding the input as UTF-8, at least the author field will still be cranked out encoded as Latin-1/PDFDocEncoding, and Evince (in contrast to other viewers and pdfinfo) will complain with the mentioned XML error. Since it would appear that Evince generates that XML itself as part of its internal operations, it seems like it fails to convert PDFDocEncoding to UTF-8 in the process. -- David Kastrup _______________________________________________ evince-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/evince-list
