One of the reasons that I'm a fan of SCSU is that, with even a relatively simple encoder, it produces output which is comparable in efficiency to that of most legacy encodings.
On 25 June 2010 18:53, Michal Suchanek <hramr...@centrum.cz> wrote: > On 25 June 2010 18:09, Owen Shepherd <owen.sheph...@e43.eu> wrote: >> The trouble is that UTF-8 is a poor standard. It bloats many texts, is >> quite expensive to parse, and has only one redeeming feature: It never >> creates embedded nulls. I suppose that it shares its encoding with >> ASCII is a feature too, but only a minor one. >> >> Personally, I think that most systems should adopt SCSU as their >> storage encoding, but that's unlikely to happen until C strings and >> MIME (two paragons of awfulness) die out. >> >> On 25 June 2010 16:00, Michael Richter <ttmrich...@gmail.com> wrote: >>> On 25 June 2010 21:34, Michal Suchanek <hramr...@centrum.cz> wrote: >>>> >>>> Perhaps fossil should have a "system encoding" which it would get from >>>> the environment (locales, windows codepage) and mark all commit >>>> messages with it. >>> >>> I vote that this is an extraordinarily bad idea. >>> Fossil is a distributed SCM system. Potentially the distributed database in >>> question could be spread around the world. Do you really want the nightmare >>> (and impossibility!) of trying to keep track of which project is in which >>> encoding scheme on which machine? UTF-8 is a standard explicitly designed >>> to stop this kind of confusion. It's also been around since 1993, so your >>> development tools have had plenty of time to catch on and actually use it. > > The fact is that Windows is a supported platform and on Windows common > tools do not use UTF-8 for good or for bad. So there should at least > be the code to identify the system encoding and convert it to the repo > encoding. > > Also note that UTF-8 and Unicode in general is not the encoding of > choice for CJK languages for various reasons. I guess it is acceptable > to convert from the system ancoding to UTF-8 on a best-effort basis > (which usually causes minimal loss of information if any) so that the > repository commit messages and other texts shown on the web can be > merged together without resorting to iframes or other similar > atrocities. > > The tracked files themselves are, of course, free to be in any > encoding. Still displaying files in arbitrary encoding on an UTF-8 web > app is somewhat troublesome so it would be an advantage to have the > possibilty to start a repo in different encoding or to switch the web > encoding so that files in different encodings can be viewed easily. > Tagging the files with an encoding when they are interpreted as text > by fossil would be also useful. > > Thanks > > Michal > _______________________________________________ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users