Eli Zaretskii <[email protected]>: > Btw, if by "UCS-2" you meant to say that only characters within the > BMP are supported in file names on Windows, then this is wrong
No, I'm claiming Windows allows pathnames to contain isolated surrogate code points, which cannot be decoded back to Unicode with UTF-16. The situation is completely analogous to Linux pathnames that can contain illegal UTF-8. > : since Windows XP, NTFS volumes support file names with characters > outside of the BMP. I've just successfully created files with such > file names on Windows XP using Emacs. Both Windows and Linux filenames support all of Unicode. Trouble is, both of them support more than Unicode, making it impossible to use Guile's strings for an arbitrary filename. Python solves the problem by using a Unicode superset in its strings. I think that's misguided, and Guile is correct in sticking to Unicode. If I understood it correctly, someone just told us emacs maps illegal UTF-8 to another form of illegal UTF-8 and back. That's better in that it's bytes to bytes (leaving Unicode out), but it's not immediately obvious to me why you have to transform the byte sequence at all. Look at the problem of concatenation. We could have a case where two illegal UTF-8 (or UTF-16) snippets are concatenated to get valid UTF-8 (or UTF-16). That operation fails if you try to translate the snippets to strings before concatenation. Such concatenation operations are commonplace when dealing with filenames (eg, split(1)). Marko
