On Fri, Mar 30, 2007 at 05:07:55PM +0600, Christopher Fynn wrote: Hi,
> IMO these days all browsers should come with their default encoding set > to UTF-8 What do you mean by a browser's default encoding? Is it the encoding to be assumed for pages lacking charset specification? In this case iso-8859-1 is a much better choise -- there are far more pages out there in the wild encoded in latin1 that lack charset info than utf8 pages that lack this info. (Maybe an utf8 auto-detection would be nice, though.) So my argument for iso-8859-1 is not theoretical but practical. > and all HTML / XHTML editors should insert UTF-8 as the > default charset when creating new pages. Agree. You're also properly using the world "should". This is how they _should_ work. Unfortunately this is not the way they actually do work. See for example two bugs in Mozilla/Nvu: https://bugzilla.mozilla.org/show_bug.cgi?id=315533 https://bugzilla.mozilla.org/show_bug.cgi?id=315543 My experiences with Seamonkey's charset handling in the html editor were even worse than with Mozilla (the 1st bug report). > Similarly all Linux distributions should use UTF-8 locales as the > default - and if a user wants to select a non UTF-8 locale at install > time they should probably receive some kind of mild warning. Perfectly agree. (Btw our distro doesn't even offer a non-utf8 locale at installation. I do believe that asking the question "do you want your system to do things right or wrong?" is always a bad idea. Software should behave right and should not offer the possibility to behave wrong.) > It may display some of them incorrectly because of the overlap of > characters 128 -> 255 in that codepage and characters Unicode defines in > that range. Unless your using an east asian codpage, most browsers now > treat anything beyond 255 as a Unicode character. I can't see this. Of course two different character sets may define different symbols for the same byte. But it's not a problem as soon as the page properly mentions its character set. The page or the http protocoll tells which asian codepage to use, and then the browser interprets the bytes according to this charset and displays the result. I can't see any case where it might be ambiguous. Could you please provice a concrete example? -- Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
