On 25 June 2010 18:09, Owen Shepherd <owen.sheph...@e43.eu> wrote:
> The trouble is that UTF-8 is a poor standard. It bloats many texts, is
> quite expensive to parse, and has only one redeeming feature: It never
> creates embedded nulls. I suppose that it shares its encoding with
> ASCII is a feature too, but only a minor one.
>
> Personally, I think that most systems should adopt SCSU as their
> storage encoding, but that's unlikely to happen until C strings and
> MIME (two paragons of awfulness) die out.
>
> On 25 June 2010 16:00, Michael Richter <ttmrich...@gmail.com> wrote:
>> On 25 June 2010 21:34, Michal Suchanek <hramr...@centrum.cz> wrote:
>>>
>>> Perhaps fossil should have a "system encoding" which it would get from
>>> the environment (locales, windows codepage) and mark all commit
>>> messages with it.
>>
>> I vote that this is an extraordinarily bad idea.
>> Fossil is a distributed SCM system.  Potentially the distributed database in
>> question could be spread around the world.  Do you really want the nightmare
>> (and impossibility!) of trying to keep track of which project is in which
>> encoding scheme on which machine?  UTF-8 is a standard explicitly designed
>> to stop this kind of confusion.  It's also been around since 1993, so your
>> development tools have had plenty of time to catch on and actually use it.

The fact is that Windows is a supported platform and on Windows common
tools do not use UTF-8 for good or for bad. So there should at least
be the code to identify the system encoding and convert it to the repo
encoding.

Also note that UTF-8 and Unicode in general is not the encoding of
choice for CJK languages for various reasons. I guess it is acceptable
to convert from the system ancoding to UTF-8 on a best-effort basis
(which usually causes minimal loss of information if any) so that the
repository commit messages and other texts shown on the web can be
merged together without resorting to iframes or other similar
atrocities.

The tracked files themselves are, of course, free to be in any
encoding. Still displaying files in arbitrary encoding on an UTF-8 web
app is somewhat troublesome so it would be an advantage to have the
possibilty to start a repo in different encoding or to switch the web
encoding so that files in different encodings can be viewed easily.
Tagging the files with an encoding when they are interpreted as text
by fossil would be also useful.

Thanks

Michal
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to