Re: UTF-8 conversion

Ken Hornstein Sat, 15 Feb 2025 18:29:09 -0800

>There are some problems with this:
>
>Like hymie! already noticed, it looks prety ugly.
>
>It's hard to read. So when you open an old draft you can't just reread
>the subject and recipient to remined you of the topic and reason for the
>draft.


I'm not disagreeing with you here, but ... well, the problem is that
there are no good solutions.  (Also, "scan" should be able to decode
those headers fine).

>> The default is to use RFC 2047 decoding (the =?...?= thing) for
>> display, but copy the raw headers verbatim when replying.
>
>Sadly there are some mail programms which use RFC 2047 for everything.
>So they even encode the "Re:" in the subject. This leads then to
>"Re: Re: ..." subject in the answer. When the transfare encoding now is
>base64 it's quite hard to notice.

Hm.  My reading of the code is that we actually handle that case (but
of course, it's not a given other MUAs do that).

>> The reason for this is that there is no guarantee that the character
>> set in the encoded header will match the user's chosen character set,
>> and there is a very real possibility that the subject will get mangled
>> if the character sets involved don't have a valid conversion for some
>> of the characters (this is further complicated by the insistance on
>> a number of people to use us-ascii as their chosen character set for
>> reasons that elude me).  Using the raw header for replies ensures this
>> doesn't happen.
>
>When I remember the RFC 2047 implementation correctly, it should be
>possible to check, if the string can complitly translated to the local
>charset. With such a check we could decode it when posible without lost
>of information and leave it RFC 2047 when not[0].
>
>There are still some problems, like having special chars (i.e. ,) in
>an encoded display name. But I would say it's possible to handle this
>without writing RFC 2047 encoded text in a draft when it's not necesary.

The problem with those ideas is that they are complicated to implement
given the current architecture.  I'll give you some specific issues:

- There's not an easy way to determine if there are characters that
  cannot be translated into the output character set without actually
  attempting the conversion.  And the current APIs don't have any
  concept of "see if we can translate this encoding", which would require
  some re-architecturing.

- The rules about RFC 2047 encoding/decoding are more complicated than
  they appear at first glance; it's not so easy to just encode the part
  of a header that has the special character which is why almost
  universally everybody does the complete header.

These problems are solvable, but they're complicated.  It's a lot of work
when the current implementation is very robust and the main complaint
with the current implementation is "it's ugly".  If someone wants to
tackle it, they are welcome, but it's not something I would personally
work on.

--Ken

Re: UTF-8 conversion

Reply via email to