Re: perl unicode support

Egmont Koblinger Thu, 29 Mar 2007 09:44:58 -0800

On Thu, Mar 29, 2007 at 01:05:57PM -0400, Rich Felker wrote:

> Gtk+-2’s approach is horribly incorrect and broken. By default it
> writes UTF-8 filenames into the filesystem even if UTF-8 is not the
> user’s encoding.


There's an environment variable that tells Gtk+-2 to use legacy encoding in
filenames. Whether or not forcing UTF-8 on filenames is a good idea is
really questionable, you're right.

But I'm not just talking about filenames, there are many more strings
handled inside Glib/Gtk+. Strings coming from gettext that will be displayed
on the screen, error messages originating from libc's strerror, strings
typed by the user into entry widgets and so on. Gtk+-2 uses UTF-8
everywhere, and (except for the filenames) it's clearly a wise decision.


> Not independently. All they have to do is convert it to the local
> encoding. And yes I’m quite aware that a lot of information might be
> lost in the process. That’s fine. If users want to be able to read
> multilingual text, they NEED to migrate to a character encoding that
> supports multilingual text. Trying to “work around” this [non-]issue
> by mixing encodings and failing to respect LC_CTYPE is a huge hassle
> for negative gain.

I think this is just plain wrong. Since when do you browse the net and read
acccented pages? Since when do you use UTF-8 locale?

I used Linux with a Latin-2 locale since 1996. It's been around 2003 that I
began using UTF-8 sometimes and it was last year that I finally managed to
switch fully to UTF-8. There are still several applications that are
nightmare with UTF-8 (midnight commander for example). A few years ago
software were even much worse, many of them were not ready for UTF-8, it
would have been nearly impossible to switch to UTF-8. When did you switch to
unicode? Probably a few years earlier than I did, but I bet you also had
those old-fashioned 8-bit days...

So, I have used Linux for 10 years with an 8-bit locale set up. Still I
could visit French, Japanese etc. pages and the letters appeared correctly.
Believe me, I would have switched to Windows or whatever if Linux browsers
weren't be able to perform this pretty simple job.

It's not about workarounds or non-issues. If a remote server tells my
browser to display a kanji then my browser _must_ display a kanji, even if
my default charset doesn't contain it. Having an old-fashioned system
configuration is no excuse for any application not to properly display the
characters. (Except for terminal applications that are forced to use the
charset I use.)

> > Show me your code that you think "just works" and I'll show you where you're
> > wrong. :-)
> 
> Mutt is an excellent example.

As you might see from the header of my messages, I'm using Mutt too. In this
regard mutt is a nice piece of software that handles accented characters
correctly (nearly) always. In order to do this, it has to be aware of the
charset of messages (and its parts) and the charset of the terminal and has
to convert between them plenty of times. The fact it does its job (mostly)
correctly implies that the authors didn't just write "blindly copy the bytes
from the message to the terminal" kind of functions, they have taken charset
issues into account and converted the strings whenever necessary. From a
user's point of view, accent handling in Mutt "just works". This is because
the developers took care of it. If the developers had tought "copying those
bytes from the mail to the terminal" would "_just work_" then mutt would be
an unusable mess.



-- 
Egmont

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to