>The «"» around `Blind-Carbon-Copy' should be \(lq and \(rq, or the >equivalent strings for consistency with the style used at start of the >paragraph.
So, in a mostly unrelated note ... I couldn't help noticing that Ralph used guillemets («») in one of his messages on this thread (way to push non-US-ASCII characters, Ralph!), and after a series of replies to his note things devolved into classic mojibake. And since hopefully most everyone on this thread is an nmh user, I wanted to understand why, because really that shouldn't have happened. I went back to the raw archives (ftp://lists.gnu.org/nmh-workers/2019-02) because the mailing list software will sometimes translate stuff into base64 encoding when it sees non-ASCII characters. And, well, I hate to assign blame, but I think it's a bit unavoidable ... please, don't anyone take this as a personal attack, I am just trying to understand how we could do better. Ralph's original note containing the guillemets (Message-Id <[email protected]>) was text/plain, a character set of utf-8, and encoded using quoted-printable. The characters were encoded properly using quoted-printable, specifically they were listed as =C2=AB and =C2=BB. Valdis was the first reply to that (Message-ID <[email protected]>), and HIS email was text/plain, character set iso-8859-1, and encoded using quoted-printable. He quoted Ralph's message, and the guillemets were encoded as =AB and =BB. Which seems correct to me. Paul Fox replied to Valdis's note (Message-Id <[email protected]>), and THAT note was text/plain, character set UTF-8, encoded using quoted-printable ... but it seems like this was the start of where things went off the rails. The original line in Valdis's email was (in raw form): > The =AB=22=BB around ... But in Paul's note it ended up as (extra > added in the reply) > > The =AB" =BB around This is NOT correct. First, there is an extra space in front of the encoded bytes. Secondly, they're not valid UTF-8; they're the ISO-8859-1 bytes. So I am guessing whatever Paul used to quote the reply didn't translate the ISO-8859-1 characters properly into UTF-8. However, whatever Mark Bergman uses for email actually made an intelligent decision. When he replied to Paul's note, those invalid UTF-8 characters got converted to the Unicode Replacement Character (U+FFFD), which was sent out as =EF=BF=BD (utf-8, quoted-printable). Further muddying the waters ... when Ralph replied to Mark's email, those Unicode Replacement Characters somehow got converted back to the correct guillemets (=C2=AB and =C2=BB). Which means Ralph has perhaps the most intelligent reply quoting program ever and he should immediately share it as it would revolutionize AI, or he went back and manually corrected it when he replied to Mark's note. I'm 50/50 on which one of those scenarios is more likely. If anyone involved with this email thread wants to pipe up with some more explanation on what exactly they used to compose their email replies, I would love to hear it. No judgements; I just want to know how nmh could help everyone do better. Like, do we need to include better tools for composing reply messages? Well, duh, the answer to that is "yes", and I think replyfilter does ok here but obviously we need to do better. But if we're SENDING something that is not valid UTF-8, should we be smarter and flag it? People were upset when we refused to send out 8-bit characters when your locale was US-ASCII (I mean, REALLY? I couldn't believe it), so I don't know what makes sense. Sending out invalid UTF-8 just seems wrong to me. --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
