Nuno Lucas wrote:
> On 11/22/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> If we use an internal line ending standard, we should consider the
>> possibility of using the standard newline character NEL, "Next Line",
>> 0x85, unicode U+0085.
> You are forgetting I can (and actually I am) versioning C files with
> text comments using some code page other than ASCII (in my case
> IBM-860, because it's a port from a MS-DOS program, and the original
> programmer was Portuguese).

Not really, because in UTF-8 all chars bigger than U+007F are (at least)
two bytes long: U+0085 encodes as "\xC2\x85", while "\x85" is not valid
UTF-8 (only multi-byte chars can use the high bit of the byte).

In fact using NEL would be nice because it has the correct "meaning"
(while \n at least in principle means "next line, same column"), but it
opens a whole new can of worms: in we use that text files MUST be
converted to some uniform Unicode format (I'd say UTF-8).
Which means that we MUST know the "origin" format on checkin and use
iconv().
And which means that on checkout there may well be cases when the
charset chosen by the user is simply not capable to encode the
repository content; which supposedly means either you error out and ruin
the life of that developer or you lose some data on the next commit.

> Don't mix character encoding problems with the end-of-line issue. They
> are very different beasts.

I do agree, let's do line-endings first (as a "sould be done" thing for
the user), and think about charset conversion only later, any SURELY as
a optional part.
(e.g.: if you define the "charset" attribute it will get converted to
the internal format, but nothing will be done in any other case, or by
default)

    Lapo



_______________________________________________
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel

Reply via email to