On 2006-09-18, Tuomo Valkonen <[EMAIL PROTECTED]> wrote:
> On 2006-09-17, Sly Gryphon <[EMAIL PROTECTED]> wrote:
>> I think it would be good for darcs to include better support for Unicode.
>
> I think it would be _essential_ for darcs to include better support for
> _locales_.
Some clarifications:
* Although the original message didn't concern them, I'm mostly
concerned with _metadata_ here. Patch author name, patch description,
and so on. It should not be necessary to have tell people how to
write their name! Supporting locales in patch metadata is _trivial_
compared to supporting them elsewhere. The changes are in the inteface
(input from user, and output) primarily, aside from slightly altered
patch format, and possibly support for older patches that do not have
encoding specified for the metadata.
* As for support for file names with non-ascii contents, there are too
many problems with files with non-ascii names on *nix, as there's no way
to know the encoding used on a file system, and different or even just a
single user might use multiple locales simultaneously on a system. I
think one should stick to ASCII in names of files with wide
distribution.
Nevertheless, if we make the quite reasonable assumption, that file
names are in the current locale encoding, support for conversions is
doable with some annoying side effects, and infact within the framework
of patch theory itself. What is needed, is for darcs to generate a
viewing transformation patch, that renames files to some unused names,
if their names can not be represented in the current locale. Then
everything just gets commuted over this. The difficult is, that other
files might refer to the renamed files. But by allowing to record new
viewing transformations, that do not get pushed to other repositories,
this can be fixed. It is a bit cumbersome, though, so I'd simply
forget about supporting different encodings in file names.
* In principle, we could use the same viewing transformation, with
special 'iconv' patches to convert other files to the local encoding.
However, most of the time, it is the text editor that should handle
the different encodings, for some files are supposed to be in a given
encoding. Thus, I don't think encodings should be supported for file
contents, at least not if this is not explicitly specified.
* UTF-16 is, of course, a rather different case than just a change in
encoding. The way I'd go about it, is to make the current patch type
polymorphic to input in arbitrary character types, if it isn't already,
and add skeleton support for plugging in and specifying different patch
type for files of arbitrary formats. (So, one day, support could be
written for structural formats to have structural instead of line-based
patches, and so on.)
--
Tuomo
_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel