On Mon, Apr 25, 2022 at 11:08:41AM +0000, Joel Buckley wrote: > Hi all, > > I have been using mutt for some time on a VT510 terminal (similar to > https://en.wikipedia.org/wiki/VT520), and enjoying it.
An actual serial hardware terminal? Those are getting to be rare beasts indeed... ;-) > The display does not support UTF-8, so I had > LC_ALL="en_US.ISO8859-1" in my ~/.profile. This worked well for > mutt. So here you say the terminal doesn't support UTF-8... > I then discovered that by changing mutt to load with > LC_ALL="en_US.UTF-8" that all was well. Huh? These two things seem to be contradictory... Also, I'm assuming this message was sent from Mutt NOT using your non-UTF-8-supporting terminal, since it is indeed encoded in UTF-8 and contains actual UTF-8 characters... Anyway, getting back to the normal order of things... > However today I received an email with the string "Don=E2=80=99t > know when I will be there next.". This should display as something > like "Don't know where I will be there next.". In my mutt terminal, > it displayed: > > "Don???t know when I will be there next". The issue is that there are no curly quotes in iso8859-1. Both Windows and Mac support a modified version of iso8859-1 that includes curly quotes, but unfortunately use different character codes for them. These character sets have their own names, but frequently mail applications are misconfigured to label them iso8859-1, because they're mostly identical and it works most places--as long as you're on the same platform as the sender. > Thinking this was odd, I dove into my filter.sh script, and > discovered that no end of hacking would enable me to filter out the > '=E2=80=99' before display --- there seemed to be some amount of > parsing before my filter got ahold of it. All that I could match on > was '???', despite being able to edit the content of the mail > itself, and see the string '=E2=80=99'. My filter line of > significance is: > > output=`echo "$output" | sed "s/[’‘]/$(echo "27" | xxd -p -r)/g"` > This replaces 'smart quotes' with their ASCII equivalents. Given that you already have a display filter script, this isn't a horrible solution--assuming it actually worked. Note that you have a couple of harmless bugs though: 1. You've doubled up your double quotes, so actually 27 is not quoted. It's harmless, but you don't need this anyway: 2. You needlessly fork two additional processes--one for the subshell for echo, another for xxd. This can be greatly simplified to: echo "$output" | sed "s/[’‘]/'/g" Presumably you avoided this because the single quote is "special" to the shell, but since in this case it is enclosed in double quotes it loses its specialness. > Thinking that this would be a matter of ensuring that the filter > script had the right character support, I added "export > LC_ALL="en_US.UTF-8"" to the top of my filter script, however this > did nothing for me. Your filter script will run with the same locale as mutt, since it is a subprocess--it inherits the locale from its parent. So if mutt were indeed started with LC_ALL=en_US.UTF-8 then so too will your display filter. But you shouldn't need to do any of this... > After some messing around, it seemed that the > only way to get mutt to support the filtering of my problematic > string was to call mutt itself with the required character encoding > (UTF-8). What character set is the message itself encoded with (according to its headers)? If your terminal is set up right, and the charset on the message is correct, then Mutt should be taking care of this already for you by running iconv on the message. Basically, except in rare cases, if your terminal is set up properly, you shouldn't ever need to deal with character sets explicitly. > Is this correct and best-practice, or have I missed something here? > My installation is currently working by using the 'export > LC_ALL="en_US.UTF-8"' line in my ~/.profile, however this feels like > bad practice Because it is. But I think you may have one of the rare cases. I think what's happening is Mutt is correctly running iconv to convert your message from UTF-8 (which it most likely is in) to iso8859-1, which partially fails due to the annoying curly quotes, and then passes it to your filter script, which runs on that but it is already converted to '?' due to the character not having an equivalent in iso8859-1. Assuming that's true, the only thing I can think of is an old trick that iconv supports, which I vaguely remember using in Mutt *ages* ago. Try explicitly setting $charset *IN MUTT* to ISO-8859-1//TRANSLIT, which might or might not help. But it's likely to have other negative effects... -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
signature.asc
Description: PGP signature