Re: mhfixmsg character set conversion

David Levine Sun, 13 Feb 2022 18:58:27 -0800

Steven Winikoff writes:

> Unfortunately, running it through mhfixmsg results in the message coming
> back unchanged.  Is that specifically about -decodeheaderfieldbodies, or
> is mhfixmsg doing nothing because the message body is already unencoded
> text/plain?


That's because -decodeheaderfieldbodys utf8 only decodes UTF-8 text.

There was a reason for only allowing decoding of UTF-8 header field
bodies.  If any character set could be decoded, it would be possible
to produce header field bodies with embedded nulls, which I expect
would result in incorrect message parsing.  It certainly would with
scan(1):  it would truncate a Subject with an embedded null.

That can't happen with UTF-8 encoded text, assuming it doesn't contain
any single-byte NUL octets.  In addition to decoding UTF-8, we could
decode ASCII because 1) we've seen it in the wild, 2) it seems as
harmless as it is pointless to encode ASCII as ASCII, assuming no
NULs, and 3) it's a proper subset of UTF-8 so it doesn't interfere
with the semantics of the "-decodeheaderfieldbodies utf8" switch.

Any other suggestions?  If there's an enumeration of character
encodings that can't have NULs, we could expand those.

> But today I sent myself a message using an IMAP-based app on my phone,
> resulting in the appended, and I'd definitely want to decode the Subject:
> header.

So I'm curious, why is the ASCII encoded as ASCII?  Why not just fold
the header as usual?  This line is too long, I'm not sure if that is
related or if it's a separate issue:

Subject: =?US-ASCII?Q?Using_the_Linux_fold_command_to_mak?=
=?US-ASCII?Q?e_text_more_readable_=7C_Network_World?=

David

Re: mhfixmsg character set conversion

Reply via email to