On Sat, Jul 07, 2012 at 02:41:41PM -0600, Jack M wrote: > > I sometimes send messages that contain the lowercase 'o' with an umlaut > over it, i.e., ö, unicode char 246. I compose my messages in vim, with > the encodings all set to utf-8. > > Occasionally I can see that I message that I sent (e.g., to myself) has > been unexepectedly encoded with quoted-printable, when I have the 'ö' in > the message body. Such messages also declare the charset as Latin1 > (which, I presume, was done by mutt, using $send_charset). > > Somewhere along the line, the quoted-printable translation of 'ö' gets > messed up. Apparently, the raw text of such a message uses =C3=B6 to > encode 'ö'. (In case this gets garbled, that's "equalsign, C3, > equalsign, B6"). Mutt transparently decodes this (I guess) and shows me > an umlauted 'o' in the pager. What's funny is that =C3=B6 is not the > correct QP-encoding for the umlauted 'o'; hence if I view the raw > message text in vim (either by pressing 'e' from the pager or by saving > to disk first), and then un-encode from QP, I get not one but two > unicode characters, and both are incorrect. (Namely, a capital A with a > tilde on top, and some strange other thing). > > As far as I can tell, the umlauted, lowercase 'o' is char 246 in both > UTF8 and Latin1. And as far as I can tell, the correct QP-translation > of character 246 ought to be =F6 (equalsign, F6). But the raw message > text has *two* QP characters, =C3=B6, neither of which is correct. > Well, yes in unicode it is still decimal 246, but displaying that in UTF-8 takes two bytes which happen to be 0xC3 0xB6. I used to know the (tedious) mechanics of how to decode UTF-8, but nowadays I just google for the hex codes, "unicode C3B6" in this case. The details of the encoding are in the UTF-8 page at wikipedia.
I know nothing about the details of quoted printable (apart from what I've just read on wikipedia). Certainly, that message isn't latin1, it's UTF-8. I suspect that the key is to find out *why* that message has been sent as quoted printable latin1. Certainly, your post here is text/plain utf-8 and reads fine. > So my questions are these: > > 1) How is the QP encoding of a perfectly good UTF-8 text getting > mangled? Is mutt screwing it up when I ask for the raw message source? > > 2) Given that it is mangled, how is it that mutt is nevertheless able to > decode and display it properly? > > I note that I'm on MacOSX 10.5, with $LANG as en_US.UTF-8 in > Terminal.app. I also note that I get the mangling whether I use console > vim or the MacVim GUI. > > -Jack > I'll also note that in the past vim has often managed to display latin1 and UTF-8 together (in a UTF-8 term), which initially caused me a lot of confusion trying to read some linux kernel patches that changed latin1 (author names, mostly) in comments to UTF-8. ĸen -- das eine Mal als Tragödie, das andere Mal als Farce
