Am Freitag, 4. November 2005 13:18 schrieb David Roundy: > [...] > That proposal would only make sense if darcs treated filenames as unicode > character strings. Instead, darcs treats them as byte sequences, which in > spite of the behavior of KDE and Gnome is what they actually are.
I think, it might not be correct to say that filenames are byte sequences. Your are thinking from a POSIX point of view where this is the case. But as far as I know it is different under other systems. Actually, I'd say that the POSIX behavior is the "wrong" behavior because a name should be a name. A name is commonly defined as a sequence of characters. So a filename ABC should mean a sequence of three characters A, B and C and not a sequence of three bytes 65, 66 and 67 or something similar. So ideally, it should be transparent to the user how his/her system enodes filenames as byte sequences. From a user's point of view a filename should always be the same sequence of characters, not bytes. This approach is contradictionary to what POSIX does. Under POSIX a filename is a byte sequence and the interpretation as a character sequence depends on the current locale which might change. So if we see a filename as a sequence of characters, changing the locale on POSIX might change the name which is a bad thing as you pointed out. But isn't there the opposite problem under Windows? As far as I know, Windows uses always the same encoding (UTF-16) for "long" filenames, independently of the current locale. To get a filename's character sequence you have to decode the stored byte sequence accoring to UTF-16. Changing the locale doesn't change the character sequence. But what if you want to interpret a name as a sequence of bytes as darcs does? You would normally take the character sequence and encode it according to the current locale. If you change the locale now, you change the byte sequence, i.e., in darcs terms, you change the filename which is what you wanted to avoid. Maybe, it would be best to let the user decide which files/directories shall have byte sequence names and which shall have character sequence names. If a user changes the locale, he/she might change his/her names depending on which decision concerning byte/character sequences he/she has made. But this wouldn't be much of a problem, assumed that the pristine cache uses a format which is locale-independent. It would just introduce a couple of modifications of the source tree but no repository corruption. The changes to the source tree might not be what the user wanted but it would be the user's responsibility to avoid such trouble. > [...] Best wishes, Wolfgang _______________________________________________ darcs-users mailing list [email protected] http://www.abridgegame.org/mailman/listinfo/darcs-users
