On Thu, Nov 03, 2005 at 10:33:04PM +0100, Thomas Zander wrote:
> On Tuesday 25 October 2005 14:52, David Roundy wrote:
> > I also believe that treating filenames as byte sequences is correct.
> > There's no requirement on posix systems that filenames either can be or
> > are represented in your current locale.
> 
> I don't know posix at all; but its good to point out that the major DEs
> (KDE as well as Gnome) at least follow the position that having a locale
> with UTF8 means the files/dirs on disk are going to be utf8.
> 
> For example; if I select a system wide LANG of en_GB.UTF-8 any new dir
> will be written in utf8 on disk by konqueror (after next login :)
> 
> So I do propose that darcs follows the locale and use utf8 on disc if the 
> locale specifies it to do so.

That proposal would only make sense if darcs treated filenames as unicode
character strings.  Instead, darcs treats them as byte sequences, which in
spite of the behavior of KDE and Gnome is what they actually are.  KDE and
Gnome can affort to pretend, since all they care about is how to display
the characters.  Darcs cares about their actual names, and if it gets the
names wrong, you get repository corruption, which makes me unhappy.
Optimistic thinking is not a good feature in a revision control system.
(Annoyingly enough, there are programming languages that seem to like
playing the same game of make-believe... it makes writing robust code a
real pain.)

The operating system does no conversions, so if there is another user who
has a different (non-UTF8) locale will see your files as a different set of
unicode characters.  If that user runs darcs get (this being a local disk),
that user will end up with files that have different unicode strings, but
the same byte sequences.  With darcs as-is, that is correct (albeit,
possibly disconcerting).  If darcs treated filenames as unicode strings,
the user who ran the darcs get will now have a corrupt repository.  If she
makes some changes and records them, you'll get an error when pulling to
your repository.  Oh joy.

Of course there are also issue such as the fact that encoding can be
changed (which doesn't modify the filenames on disk), the existence of
removable media, etc.  Pretending that everything is in the locale encoding
really is a big game of make-believe, and only works because if it's wrong
nothing bad happens (your files look like they've got funny names, or their
names can't be displayed at all).

As I said before, I like the idea of an *optional* filename encoding
conversion in darcs, but it needs to be both optional and explicit, so that
when the user runs a local darcs get on your UTF-8-encoded repository, the
resulting repository is also UTF-8-encoded, regardless of what that user's
locale is.  And it had better be true that if you change your locale, that
also doesn't affect the encoding of a repository on disk.
-- 
David Roundy
http://www.darcs.net

Attachment: signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

Reply via email to