Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

Owen Shepherd Fri, 25 Jun 2010 11:18:57 -0700

One of the reasons that I'm a fan of SCSU is that, with even a
relatively simple encoder, it produces output which is comparable in
efficiency to that of most legacy encodings.


On 25 June 2010 18:53, Michal Suchanek <hramr...@centrum.cz> wrote:
> On 25 June 2010 18:09, Owen Shepherd <owen.sheph...@e43.eu> wrote:
>> The trouble is that UTF-8 is a poor standard. It bloats many texts, is
>> quite expensive to parse, and has only one redeeming feature: It never
>> creates embedded nulls. I suppose that it shares its encoding with
>> ASCII is a feature too, but only a minor one.
>>
>> Personally, I think that most systems should adopt SCSU as their
>> storage encoding, but that's unlikely to happen until C strings and
>> MIME (two paragons of awfulness) die out.
>>
>> On 25 June 2010 16:00, Michael Richter <ttmrich...@gmail.com> wrote:
>>> On 25 June 2010 21:34, Michal Suchanek <hramr...@centrum.cz> wrote:
>>>>
>>>> Perhaps fossil should have a "system encoding" which it would get from
>>>> the environment (locales, windows codepage) and mark all commit
>>>> messages with it.
>>>
>>> I vote that this is an extraordinarily bad idea.
>>> Fossil is a distributed SCM system.  Potentially the distributed database in
>>> question could be spread around the world.  Do you really want the nightmare
>>> (and impossibility!) of trying to keep track of which project is in which
>>> encoding scheme on which machine?  UTF-8 is a standard explicitly designed
>>> to stop this kind of confusion.  It's also been around since 1993, so your
>>> development tools have had plenty of time to catch on and actually use it.
>
> The fact is that Windows is a supported platform and on Windows common
> tools do not use UTF-8 for good or for bad. So there should at least
> be the code to identify the system encoding and convert it to the repo
> encoding.
>
> Also note that UTF-8 and Unicode in general is not the encoding of
> choice for CJK languages for various reasons. I guess it is acceptable
> to convert from the system ancoding to UTF-8 on a best-effort basis
> (which usually causes minimal loss of information if any) so that the
> repository commit messages and other texts shown on the web can be
> merged together without resorting to iframes or other similar
> atrocities.
>
> The tracked files themselves are, of course, free to be in any
> encoding. Still displaying files in arbitrary encoding on an UTF-8 web
> app is somewhat troublesome so it would be an advantage to have the
> possibilty to start a repo in different encoding or to switch the web
> encoding so that files in different encodings can be viewed easily.
> Tagging the files with an encoding when they are interpreted as text
> by fossil would be also useful.
>
> Thanks
>
> Michal
> _______________________________________________
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

Reply via email to