Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

Owen Shepherd Fri, 25 Jun 2010 12:37:40 -0700

On 25 June 2010 19:36, Michal Suchanek <hramr...@centrum.cz> wrote:
> On 25 June 2010 20:18, Owen Shepherd <owen.sheph...@e43.eu> wrote:
>> One of the reasons that I'm a fan of SCSU is that, with even a
>> relatively simple encoder, it produces output which is comparable in
>> efficiency to that of most legacy encodings.
>
> SCSU is a horrendous encoding because it uses shifts. When the shift
> is lost the text has completely different meaning. In UTF-8 if you
> remove part of the text only that part is affected (if you cut
> mid-character you create a bad character at worst but it can be
> clearly detected).


And how often do you lose a couple of bytes in the middle of a file?
More precisely, how often do you lose them and not have a checksum
fail (or some other error) notifying you of this?

It's a particularly egregious complaint in the context of Fossil -
where all records are hashed anyway! Additionally, if the same kind of
error were to occur to the SQLite file that the repository is
contained within, it would probably be trashed irretrievably.

Years of experience with binary and other modal file formats (XML and
HTML to name two very common) show that this is a complete non-issue.

SCSU is of course a poor choice for an in-memory format (Use UTF-16)
or interacting with the console (For backwards compatibility you're
probably going to have to use UTF-8). But for a storage format,
particularly one embedded within a database? It's pretty much perfect.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

Reply via email to