Re: Perl Unicode support

Rich Felker Fri, 30 Mar 2007 07:34:41 -0800

On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote:
> > I say that his browser mush show è correctly, it doesn't matter what its
> > locale is.
> 
> That depends on the configuration of the browser.
> 
> The browser should by default (programmer's choice really) think in the
> encoding X used, since it's tagged with that encoding information.
> 
> If Y's computer supports the encoding X used (it doesn't have to be Y's
> preferred encoding), the browser should use X's encoding when showing Y


What does “supports the encoding” mean? Applications cannot select the
locale they run in, aside from requesting the “C” or “POSIX” locale.
It’s the decision of the user and/or the system implementor. In fact
it would be impossible to switch locales when visiting different pages
anyway. How would you deal with multiple browser windows or tabs, or
even frames?

> If Y's computer doesn't support the encoding X used, the browser should,
> as a fallback solution, try to convert the page to Y's encoding if
> possible.

This is why I’m confused about what you mean by “support the
encoding”. The app cannot switch it’s native encoding (the locale), so
supporting the encoding would have to mean supporting it as an option
for conversion... But then, if the system doesn’t “support” it in this
sense, how would you go about converting?

Normal implementations work either by converting all data to the
user’s encoding, or by converting it all to some representation of
Unicode (UTF-8 or UTF-32, or something nonstandard like UTF-21).

> I think clipboards treat the data as bytes, so if Y wants to copy from X's
> page and paste it into program P, Y has to make sure that the browser
> converts the data to Y's preferred encoding before copying, since P's
> input validation would (should) complain otherwise (when pasting).

X selection thinks in ASCII or UTF-8. Technically the ASCII mode can
also be used for Latin-1, but IMO it’s a bad idea to continue to
support this since it’s obviously a broken interface. There’s also a
nasty scheme based on ISO-2022 which should be avoided at all costs.
So, in order to communicate cleanly via the X selection, X apps need
to be able to convert their data to and from UTF-8.

In a way I think this is bad, because it makes things difficult for
apps, but the motivation seems to be at least somewhat correct.
There’s no reason to expect that other X clients are even running on
the same machine, and they machines they’re running on might use
different encodings, so a universal encoding is needed for
interchange. It would be nice if xlib provided an API to convert the
data to and from the locale’s encoding automatically upon sending and
receiving it, however. (This could be a no-op on UTF-8-only systems.)

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl Unicode support

Reply via email to