Re: perl unicode support

Egmont Koblinger Fri, 30 Mar 2007 03:31:32 -0800

On Fri, Mar 30, 2007 at 05:07:55PM +0600, Christopher Fynn wrote:

Hi,


> IMO these days all browsers should come with their default encoding set 
> to UTF-8

What do you mean by a browser's default encoding? Is it the encoding to be
assumed for pages lacking charset specification? In this case iso-8859-1 is
a much better choise -- there are far more pages out there in the wild
encoded in latin1 that lack charset info than utf8 pages that lack this
info. (Maybe an utf8 auto-detection would be nice, though.) So my argument
for iso-8859-1 is not theoretical but practical.

> and all HTML / XHTML editors should insert UTF-8 as the 
> default charset when creating new pages.

Agree. You're also properly using the world "should". This is how they
_should_ work. Unfortunately this is not the way they actually do work.
See for example two bugs in Mozilla/Nvu:
https://bugzilla.mozilla.org/show_bug.cgi?id=315533
https://bugzilla.mozilla.org/show_bug.cgi?id=315543

My experiences with Seamonkey's charset handling in the html editor were
even worse than with Mozilla (the 1st bug report).

> Similarly all Linux distributions should use UTF-8 locales as the 
> default - and if a user wants to select a non UTF-8 locale at install 
> time they should probably receive some kind of mild warning.

Perfectly agree. (Btw our distro doesn't even offer a non-utf8 locale at
installation. I do believe that asking the question "do you want your system
to do things right or wrong?" is always a bad idea. Software should behave
right and should not offer the possibility to behave wrong.)


> It may display some of them incorrectly because of the overlap of 
> characters 128 -> 255 in that codepage and characters Unicode defines in 
> that range. Unless your using an east asian codpage, most browsers now 
> treat anything beyond 255 as a Unicode character.

I can't see this. Of course two different character sets may define
different symbols for the same byte. But it's not a problem as soon as the
page properly mentions its character set. The page or the http protocoll
tells which asian codepage to use, and then the browser interprets the bytes
according to this charset and displays the result. I can't see any case
where it might be ambiguous. Could you please provice a concrete example?


-- 
Egmont

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to