>> Correct me if I'm wrong, but isn't the web server supposed to tell the
>> client which charset is used: Latin2 or UTF-8? [...]
>
> You're perfectly right and understand the whole concept of encodings.
>
> What I'm arguing with Rich is the following situation:
>
> X writes a homepage in French, using either latin1 or utf8 encoding (but
> mentions this encoding properly), and of course he uses all the french
> letters, including e.g. è (e with grave accent).
>
> Y is sitting in Poland for example, using a system configured to use a
> latin2 locale by default. Latin2 lacks e with grave accent. Y visits the
> homepage of X with some popular graphical web browser.
>
> What should happen?
>
> Rich says that his browser must (or should?) think in latin2 and hence
> drop the è letters, maybe replace them with unaccented e or question
> marks or similar.
>
> I say that his browser mush show è correctly, it doesn't matter what its
> locale is.

That depends on the configuration of the browser.

The browser should by default (programmer's choice really) think in the
encoding X used, since it's tagged with that encoding information.

If Y's computer supports the encoding X used (it doesn't have to be Y's
preferred encoding), the browser should use X's encoding when showing Y
the page (unless Y instructs, automatically (preference setting) or
manually (choosing in menu or such), the browser to convert the page to
Y's preferred encoding).

If Y's computer doesn't support the encoding X used, the browser should,
as a fallback solution, try to convert the page to Y's encoding if
possible. If the letter "è" isn't supported it should be replaced by
another letter (such as "?") or a symbol indicating that some data
couldn't be converted. It's also nice if the browser explains that a
conversion was made (maybe not as an alert (too intrusive, unless it
provides an option to install the missing encoding support), but maybe in
the information bar or status bar). Such an explanation would get the user
more interested in upgrading the system to support more encodings.

I think clipboards treat the data as bytes, so if Y wants to copy from X's
page and paste it into program P, Y has to make sure that the browser
converts the data to Y's preferred encoding before copying, since P's
input validation would (should) complain otherwise (when pasting).

Sincerely,
Fredrik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to