Re: [Nmh-workers] General question - unsupported charset conversion

Ken Hornstein Fri, 28 Feb 2014 10:50:38 -0800

>Unfortunately, I have a lot of experience and troubles with character
>set conversion.


Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
all of these problems! :-)

>> Should we return the original bytes?  
>
>It is not the best idea. Some sequences of bytes are control sequences
>for terminal. This sometimes set terminal in unusable state.

Seems fine to me.

>> An error? [..]  Some string which says, "We cannot convert
>> klingon-8842 to us-ascii" or the equivalent?
>> 
>
>In practice it means a spam in exotic language and at this point I know
>that I do not want to read such a message. 

I can see that, but I'm not sure that's an appropriate choice for all
cases (like, for instance, MIME parameters).

>> - What to do when we cannot convert a particular character.  This is a
>> little more clear; the general trend is to use a substitution
>> character.
>
>This is very frequent and causes a lot of troubles. Entire message in
>English and one foreign family name in original. Message send in utf-8
>but (suppose) my terminal support only ASCII. Converison would fail. 

Errr ... really?  In the case I'm thinking, the one foreign family
name would have the offending character output as a '?' (or whatever).
The conversion would go through fine.

>In my personal opinion a very good choice is conversion into
>html-entities, like &aogon; or &lstrok; . It remains quite readable and
>is still unique enough to convert it back in case of need.

Um, ouch.  Unless there's a common library that already implements
that behavior, that's not on the table at all.

--Ken

_______________________________________________
Nmh-workers mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] General question - unsupported charset conversion

Reply via email to