On Thu, Mar 29, 2007 at 01:05:57PM -0400, Rich Felker wrote: > Gtk+-2’s approach is horribly incorrect and broken. By default it > writes UTF-8 filenames into the filesystem even if UTF-8 is not the > user’s encoding.
There's an environment variable that tells Gtk+-2 to use legacy encoding in filenames. Whether or not forcing UTF-8 on filenames is a good idea is really questionable, you're right. But I'm not just talking about filenames, there are many more strings handled inside Glib/Gtk+. Strings coming from gettext that will be displayed on the screen, error messages originating from libc's strerror, strings typed by the user into entry widgets and so on. Gtk+-2 uses UTF-8 everywhere, and (except for the filenames) it's clearly a wise decision. > Not independently. All they have to do is convert it to the local > encoding. And yes I’m quite aware that a lot of information might be > lost in the process. That’s fine. If users want to be able to read > multilingual text, they NEED to migrate to a character encoding that > supports multilingual text. Trying to “work around” this [non-]issue > by mixing encodings and failing to respect LC_CTYPE is a huge hassle > for negative gain. I think this is just plain wrong. Since when do you browse the net and read acccented pages? Since when do you use UTF-8 locale? I used Linux with a Latin-2 locale since 1996. It's been around 2003 that I began using UTF-8 sometimes and it was last year that I finally managed to switch fully to UTF-8. There are still several applications that are nightmare with UTF-8 (midnight commander for example). A few years ago software were even much worse, many of them were not ready for UTF-8, it would have been nearly impossible to switch to UTF-8. When did you switch to unicode? Probably a few years earlier than I did, but I bet you also had those old-fashioned 8-bit days... So, I have used Linux for 10 years with an 8-bit locale set up. Still I could visit French, Japanese etc. pages and the letters appeared correctly. Believe me, I would have switched to Windows or whatever if Linux browsers weren't be able to perform this pretty simple job. It's not about workarounds or non-issues. If a remote server tells my browser to display a kanji then my browser _must_ display a kanji, even if my default charset doesn't contain it. Having an old-fashioned system configuration is no excuse for any application not to properly display the characters. (Except for terminal applications that are forced to use the charset I use.) > > Show me your code that you think "just works" and I'll show you where you're > > wrong. :-) > > Mutt is an excellent example. As you might see from the header of my messages, I'm using Mutt too. In this regard mutt is a nice piece of software that handles accented characters correctly (nearly) always. In order to do this, it has to be aware of the charset of messages (and its parts) and the charset of the terminal and has to convert between them plenty of times. The fact it does its job (mostly) correctly implies that the authors didn't just write "blindly copy the bytes from the message to the terminal" kind of functions, they have taken charset issues into account and converted the strings whenever necessary. From a user's point of view, accent handling in Mutt "just works". This is because the developers took care of it. If the developers had tought "copying those bytes from the mail to the terminal" would "_just work_" then mutt would be an unusable mess. -- Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
