On Sat, Mar 17, 2007 at 08:25:43AM +0000, Colin Paul Adams wrote:
> Now this is where it gets interesting.
> My URI resolver translates the file name (the URI is relative to a
> base file: URI) into a UTF-8 byte sequence which gets passed to the
> fopen call (the program is supposed to work on other O/Ses too, not
> just Linux, but I'll worry about that later).
> 
> The test suite is currently distributed as a zip file. It so happens
> that the file concerned is named using ISO-8859-1 on the distributors
> system. On my system, doing ls from the GNOME console shows the name
> as xgespr?ch.xml. Whereas Emacs dired shows the name as
> xgespräch.xml.
> 
> I'm not sure exactly how fopen is supposed to handle the situation.

It's not. You should not create files in your filesystem with the
wrong encoding. If you do, then the only way to access them is via
whatever the (invalid) byte sequence is.

> Anyway, the test failed - not surprisingly.
> I looked at the unzip man page, to see if there was any filename
> translation option. I couldn't find one.

Yes, the problem here is the unzip command. It should provide a way to
translate filenames...

> So I tried unzipping the distrbution afresh, but this time with
> LANG=en_GB.

That won't help. You can't mix encodings in the filesystem and expect
any reasonable behavior.

> Emacs still showed the same name, ls however showed a completely
> different character (it loked like it might be arabic to me - I don't
> know).
> 
> The test still failed.
> 
> So I went back to LANG=en_GB.UTF-8, unzipped the distribution again,
> and re-named the file, thanks to your help.

Yep, this is the only reasonable fix until the unzip command is fixed
to handle foreign encodings.

> ls now shows the correct file name. Emacs shows
> xgespräch.xml. And the test works.

(setq file-name-coding-system 'utf-8)

~Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to