Am Samstag, 22. Oktober 2005 13:46 schrieb Tommy Pettersson: > On Fri, Oct 21, 2005 at 02:58:10PM +0200, Wolfgang Jeltsch wrote: > > how does darcs handle non-ASCII characters in filenames, patch names and > > long comments? What happens, for example, if someone who uses a > > different character encoding than me fetches a copy of my repository via > > darcs get? Will non-ASCII characters be properly interpreted? > > When outputting to the terminal darcs is supposed to escape > everything that is not printable ASCII by default.
Yes, it does so. > There are some environment variables described in section 'Character > escaping and non-ASCII character encodings' in the manual > <http://www.darcs.net/manual/> to allow 8-bit chars and UTF8. I use an UTF8 locale and tried DARCS_DONT_ESCAPE_8BIT=1. Alas, darcs still prints non-ASCII characters in escaped form but I think this is because I use darcs 1.0.2. > Darcs does not interpret the encoding of patch names or patch comments, so > the output can probably look garbage if looked at with a wrong locale. That's probably bad. Patch names and patch comments are character sequences. Therefore they probably shouldn't be handled as just byte sequences but the encoding should be taken into account. For example, if I commit a patch with a name containing non-ASCII characters using a Latin-1 locale, the patch name should be the same for an user using a UTF-8 locale. > The same should be true for file names, but I believe there is some > interference with a Haskell module that uses UTF8, and I've sometimes seen > UTF8 in output of filenames event hough I use Latin1, which should be > regarded as bugs. And I see "double-UTF-8" in output. I use an UTF-8 locale. darcs takes my UTF-8 encoded filenames, interprets them as Latin-1 encoded and does a Latin-1-to-UTF-8 conversion before outputting them on the terminal. So a character like รค is represented by 4 bytes then. Well this might be just a presentational problem. I suppose that darcs stores the byte sequence which makes up a filename verbatim, not taking any encodings into account. In other words, for darcs, filenames seem to be byte sequences instead of character sequences. The question is if this is a good behavior. At least, it avoids problems if, for example, a Makefile refers to a file. If filenames would be treated as character sequences, the underlying byte sequence would change if a different encoding is used. But the byte sequence in the Makefile won't change so the Makefile won't work correctly anymore. Is it really true that filenames are just byte sequences for darcs and no character encodings are taken into account when storing and retrieving filenames? Or are filenames treated as character sequences? Or are they treated non-uniformly which would mean that darcs is buggy at this point and one should avoid filenames with non-ASCII characters? Best wishes, Wolfgang _______________________________________________ darcs-users mailing list [email protected] http://www.abridgegame.org/mailman/listinfo/darcs-users
