On Mon, Apr 25, 2022 at 11:08:41AM +0000, Joel Buckley wrote:
> Hi all,
> 
> I have been using mutt for some time on a VT510 terminal (similar to
> https://en.wikipedia.org/wiki/VT520), and enjoying it. 

An actual serial hardware terminal?  Those are getting to be rare
beasts indeed... ;-)

> The display does not support UTF-8, so I had
> LC_ALL="en_US.ISO8859-1" in my ~/.profile. This worked well for
> mutt.

So here you say the terminal doesn't support UTF-8...

> I then discovered that by changing mutt to load with
> LC_ALL="en_US.UTF-8" that all was well. 

Huh?  These two things seem to be contradictory...  

Also, I'm assuming this message was sent from Mutt NOT using your
non-UTF-8-supporting terminal, since it is indeed encoded in UTF-8 and
contains actual UTF-8 characters...

Anyway, getting back to the normal order of things...

> However today I received an email with the string "Don=E2=80=99t
> know when I will be there next.". This should display as something
> like "Don't know where I will be there next.". In my mutt terminal,
> it displayed:
> > "Don???t know when I will be there next".

The issue is that there are no curly quotes in iso8859-1.  Both
Windows and Mac support a modified version of iso8859-1 that includes
curly quotes, but unfortunately use different character codes for
them.  These character sets have their own names, but frequently mail
applications are misconfigured to label them iso8859-1, because
they're mostly identical and it works most places--as long as you're
on the same platform as the sender.

> Thinking this was odd, I dove into my filter.sh script, and
> discovered that no end of hacking would enable me to filter out the
> '=E2=80=99' before display --- there seemed to be some amount of
> parsing before my filter got ahold of it. All that I could match on
> was '???', despite being able to edit the content of the mail
> itself, and see the string '=E2=80=99'. My filter line of
> significance is:
> > output=`echo "$output" | sed "s/[’‘]/$(echo "27" | xxd -p -r)/g"`
> This replaces 'smart quotes' with their ASCII equivalents.

Given that you already have a display filter script, this isn't a
horrible solution--assuming it actually worked.  Note that you have a
couple of harmless bugs though: 

1. You've doubled up your double quotes, so actually 27 is not quoted.
   It's harmless, but you don't need this anyway:
2. You needlessly fork two additional processes--one for the subshell
   for echo, another for xxd.  This can be greatly simplified to:

   echo "$output" | sed "s/[’‘]/'/g"

   Presumably you avoided this because the single quote is "special"
   to the shell, but since in this case it is enclosed in double
   quotes it loses its specialness.

> Thinking that this would be a matter of ensuring that the filter
> script had the right character support, I added "export
> LC_ALL="en_US.UTF-8"" to the top of my filter script, however this
> did nothing for me.

Your filter script will run with the same locale as mutt, since it is
a subprocess--it inherits the locale from its parent.  So if mutt were
indeed started with LC_ALL=en_US.UTF-8 then so too will your display
filter.  But you shouldn't need to do any of this...

> After some messing around, it seemed that the
> only way to get mutt to support the filtering of my problematic
> string  was to call mutt itself with the required character encoding
> (UTF-8).

What character set is the message itself encoded with (according to
its headers)?  If your terminal is set up right, and the charset on
the message is correct, then Mutt should be taking care of this
already for you by running iconv on the message.  Basically, except in
rare cases, if your terminal is set up properly, you shouldn't ever
need to deal with character sets explicitly.
 
> Is this correct and best-practice, or have I missed something here?
> My installation is currently working by using the 'export
> LC_ALL="en_US.UTF-8"' line in my ~/.profile, however this feels like
> bad practice

Because it is.

But I think you may have one of the rare cases.  I think what's
happening is Mutt is correctly running iconv to convert your message
from UTF-8 (which it most likely is in) to iso8859-1, which partially
fails due to the annoying curly quotes, and then passes it to your
filter script, which runs on that but it is already converted to '?'
due to the character not having an equivalent in iso8859-1.

Assuming that's true, the only thing I can think of is an old trick
that iconv supports, which I vaguely remember using in Mutt *ages*
ago.  Try explicitly setting $charset *IN MUTT* to
ISO-8859-1//TRANSLIT, which might or might not help.  But it's likely
to have other negative effects...

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: signature.asc
Description: PGP signature

Reply via email to