On 2006-09-18, Tuomo Valkonen <[EMAIL PROTECTED]> wrote:
> On 2006-09-17, Sly Gryphon <[EMAIL PROTECTED]> wrote:
>> I think it would be good for darcs to include better support for Unicode.
>
> I think it would be _essential_ for darcs to include better support for
> _locales_. 

Some clarifications:

  * Although the original message didn't concern them, I'm mostly 
    concerned with _metadata_ here. Patch author name, patch description,
    and so on. It should not be necessary to have tell people how to 
    write their name! Supporting locales in patch metadata is _trivial_
    compared to supporting them elsewhere. The changes are in the inteface
    (input from user, and output) primarily, aside from slightly altered
    patch format, and possibly support for older patches that do not have
    encoding specified for the metadata.

  * As for support for file names with non-ascii contents, there are too
    many problems with files with non-ascii names on *nix, as there's no way
    to know the encoding used on a file system, and different or even just a
    single user might use multiple locales simultaneously on a system.  I
    think one should stick to ASCII in names of files with wide
    distribution.

    Nevertheless, if we make the quite reasonable assumption, that file
    names are in the current locale encoding, support for conversions is
    doable with some annoying side effects, and infact within the framework
    of patch theory itself. What is needed, is for darcs to generate a
    viewing transformation patch, that renames files to some unused names,
    if their names can not be represented in the current locale. Then
    everything just gets commuted over this. The difficult is, that other
    files might refer to the renamed files. But by allowing to record new
    viewing transformations, that do not get pushed to other repositories,
    this can be fixed. It is a bit cumbersome, though, so I'd simply 
    forget about supporting different encodings in file names.

  * In principle, we could use the same viewing transformation, with
    special 'iconv' patches to convert other files to the local encoding.
    However, most of the time, it is the text editor that should handle
    the different encodings, for some files are supposed to be in a given
    encoding. Thus, I don't think encodings should be supported for file
    contents, at least not if this is not explicitly specified.

  * UTF-16 is, of course, a rather different case than just a change in
    encoding. The way I'd go about it, is to make the current patch type
    polymorphic to input in arbitrary character types, if it isn't already,
    and add skeleton support for plugging in and specifying different patch
    type for files of arbitrary formats. (So, one day, support could be 
    written for structural formats to have structural instead of line-based
    patches, and so on.)

-- 
Tuomo


_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to