Am Freitag, 4. November 2005 13:18 schrieb David Roundy:
> [...]

> That proposal would only make sense if darcs treated filenames as unicode
> character strings.  Instead, darcs treats them as byte sequences, which in
> spite of the behavior of KDE and Gnome is what they actually are.

I think, it might not be correct to say that filenames are byte sequences.  
Your are thinking from a POSIX point of view where this is the case.  But as 
far as I know it is different under other systems.

Actually, I'd say that the POSIX behavior is the "wrong" behavior because a 
name should be a name.  A name is commonly defined as a sequence of 
characters.  So a filename ABC should mean a sequence of three characters A, 
B and C and not a sequence of three bytes 65, 66 and 67 or something similar.  
So ideally, it should be transparent to the user how his/her system enodes 
filenames as byte sequences.  From a user's point of view a filename should 
always be the same sequence of characters, not bytes.

This approach is contradictionary to what POSIX does.  Under POSIX a filename 
is a byte sequence and the interpretation as a character sequence depends on 
the current locale which might change.  So if we see a filename as a sequence 
of characters, changing the locale on POSIX might change the name which is a 
bad thing as you pointed out.

But isn't there the opposite problem under Windows?  As far as I know, Windows 
uses always the same encoding (UTF-16) for "long" filenames, independently of 
the current locale.  To get a filename's character sequence you have to 
decode the stored byte sequence accoring to UTF-16.  Changing the locale 
doesn't change the character sequence.  But what if you want to interpret a 
name as a sequence of bytes as darcs does?  You would normally take the 
character sequence and encode it according to the current locale.  If you 
change the locale now, you change the byte sequence, i.e., in darcs terms, 
you change the filename which is what you wanted to avoid.

Maybe, it would be best to let the user decide which files/directories shall 
have byte sequence names and which shall have character sequence names.  If a 
user changes the locale, he/she might change his/her names depending on which 
decision concerning byte/character sequences he/she has made.  But this 
wouldn't be much of a problem, assumed that the pristine cache uses a format 
which is locale-independent.  It would just introduce a couple of 
modifications of the source tree but no repository corruption.  The changes 
to the source tree might not be what the user wanted but it would be the 
user's responsibility to avoid such trouble.

> [...]

Best wishes,
Wolfgang

_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

Reply via email to