On Fri, Mar 30, 2007 at 05:17:32PM +0200, Fredrik Jervfors wrote: > > I say that his browser mush show è correctly, it doesn't matter what its > > locale is. > > That depends on the configuration of the browser. > > The browser should by default (programmer's choice really) think in the > encoding X used, since it's tagged with that encoding information. > > If Y's computer supports the encoding X used (it doesn't have to be Y's > preferred encoding), the browser should use X's encoding when showing Y
What does “supports the encoding” mean? Applications cannot select the locale they run in, aside from requesting the “C” or “POSIX” locale. It’s the decision of the user and/or the system implementor. In fact it would be impossible to switch locales when visiting different pages anyway. How would you deal with multiple browser windows or tabs, or even frames? > If Y's computer doesn't support the encoding X used, the browser should, > as a fallback solution, try to convert the page to Y's encoding if > possible. This is why I’m confused about what you mean by “support the encoding”. The app cannot switch it’s native encoding (the locale), so supporting the encoding would have to mean supporting it as an option for conversion... But then, if the system doesn’t “support” it in this sense, how would you go about converting? Normal implementations work either by converting all data to the user’s encoding, or by converting it all to some representation of Unicode (UTF-8 or UTF-32, or something nonstandard like UTF-21). > I think clipboards treat the data as bytes, so if Y wants to copy from X's > page and paste it into program P, Y has to make sure that the browser > converts the data to Y's preferred encoding before copying, since P's > input validation would (should) complain otherwise (when pasting). X selection thinks in ASCII or UTF-8. Technically the ASCII mode can also be used for Latin-1, but IMO it’s a bad idea to continue to support this since it’s obviously a broken interface. There’s also a nasty scheme based on ISO-2022 which should be avoided at all costs. So, in order to communicate cleanly via the X selection, X apps need to be able to convert their data to and from UTF-8. In a way I think this is bad, because it makes things difficult for apps, but the motivation seems to be at least somewhat correct. There’s no reason to expect that other X clients are even running on the same machine, and they machines they’re running on might use different encodings, so a universal encoding is needed for interchange. It would be nice if xlib provided an API to convert the data to and from the locale’s encoding automatically upon sending and receiving it, however. (This could be a no-op on UTF-8-only systems.) Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
