Re: guile can't find a chinese named file

Marko Rauhamaa Wed, 15 Feb 2017 23:17:01 -0800

Eli Zaretskii <[email protected]>:

> Btw, if by "UCS-2" you meant to say that only characters within the
> BMP are supported in file names on Windows, then this is wrong


No, I'm claiming Windows allows pathnames to contain isolated surrogate
code points, which cannot be decoded back to Unicode with UTF-16.

The situation is completely analogous to Linux pathnames that can
contain illegal UTF-8.

> : since Windows XP, NTFS volumes support file names with characters
> outside of the BMP. I've just successfully created files with such
> file names on Windows XP using Emacs.

Both Windows and Linux filenames support all of Unicode. Trouble is,
both of them support more than Unicode, making it impossible to use
Guile's strings for an arbitrary filename.

Python solves the problem by using a Unicode superset in its strings. I
think that's misguided, and Guile is correct in sticking to Unicode.

If I understood it correctly, someone just told us emacs maps illegal
UTF-8 to another form of illegal UTF-8 and back. That's better in that
it's bytes to bytes (leaving Unicode out), but it's not immediately
obvious to me why you have to transform the byte sequence at all.

Look at the problem of concatenation. We could have a case where two
illegal UTF-8 (or UTF-16) snippets are concatenated to get valid UTF-8
(or UTF-16). That operation fails if you try to translate the snippets
to strings before concatenation. Such concatenation operations are
commonplace when dealing with filenames (eg, split(1)).


Marko

Re: guile can't find a chinese named file

Reply via email to