On Tue, Feb 10, 2009 at 06:57:04PM +0100, Vincent Lefevre wrote:
> On 2009-02-10 18:16:43 +0200, Adrian Bunk wrote:
> > On Tue, Feb 10, 2009 at 04:33:23PM +0100, Vincent Lefevre wrote:
> > > FYI, I prefer the current one because iso-8859-1 takes less space
> > > than utf-8 (note that on the network, mail is not compressed),
> >
> > It only makes a difference if you use non-ASCII characters AND
> > no characters outside iso-8859-1 (like the € sign) in an email.
>
> So is the change of $send_charset. So, I suppose that these cases
> are important enough.
Having iso-8859-1 preferred over UTF-8 was a good choice back in
2000 when $send_charset was set this way in init.h, since back then
UTF-8 support in MUAs was not always good.
Now in 2009 that's no longer a problem.
> > And the size advantage in these cases would typically be something
> > around 1%, so not really noticable.
>
> This depends on the language and the length of the message.
> There's much more 1% of accented characters in French text,
> for instance.
But there are also the characters œ and Œ in French text, and they are
not in iso-8859-1.
> So, it can be noticeable.
"noticeable" if you manually count bytes.
Even if it was 10% it wouldn't make any difference in practice (emails
are big when someone adds a 1MB attachment, the few bytes in the email
body hardly make any difference you notice in practice).
> > > Also, using "us-ascii:utf-8" will not affect received mail, so that
> > > if a user wants to deal with UTF-8 only, he must have some tools for
> > > charset conversion when receiving mail (and changing $send_charset
> > > would just be some minor configuration change for a specific usage).
> >
> > As already discussed, having more charsets in the mix can cause problems
> > when sending patches in the body of an email (e.g. when submitting
> > patches to linux-kernel).
>
> Well, your tools must cope with messages with different charsets in
> a mailbox (and encodings other then 7bit/8bit). If they don't, they
> are broken.
>
> Also, this is for a specific usage. Other users may prefer iso-8859-1
> (when possible) for their specific usage. There's no default that
> would make everyone happy.
UTF-8 has the advantages compared to iso-8859-1:
- it can handle all characters in one charset
(iso-8859-1 won't work without the fallback to UTF-8)
- it has already become more or less the standard charset
under Linux
And globally, it's a huge improvement that everyone is moving away from
a gazillion different charsets to UTF-8.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]