On 2013/01/08 14:43, Stephan Stiller wrote:
Wouldn't the clean way be to ensure valid strings (only) when they're
built
Of course, the earlier erroneous data gets caught, the better. The
problem is that error checking is expensive, both in lines of code and
in execution time (I think there
Wouldn't the clean way be to ensure valid strings (only) when they're
built
Of course, the earlier erroneous data gets caught, the better. The problem
is that error checking is expensive, both in lines of code and in execution
time (I think there is data showing that in any real-life
Le 08/01/2013 01:26, Ben Scarborough a écrit :
This isn't directly related to Unicode, but I thought this would be a
good place to ask.
Specifically, I'm curious about figure 14 (Gordon 1982) from WG2 N3218
[http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3218.pdf], which says:
Whereas our so-called
Sorry, but I have to disagree here. If a list of strings contains items
with lone surrogates (garbage), then sorting them doesn't make the
garbage go away, even if the items may be sorted in correct order
according to some criterion.
Well, yeah, I wasn't claiming that the principled, correct
Thank you for commenting and Happy New Year.
CP-1252 is a perfectly legal web character set, and nobody is going to
argue with you if you want to use it in legal ways. (I.e. writing
Latin script in it, not Sinhala.) But .
Okay, what is implied is I am doing something illegal. Define what I am
2013-01-08 23:56, Naena Guru wrote:
May I ask if the following two are Latin script, English or Singhala?
1. This is written in English.
2. mee laþingaþa síhalayi.
For me, both are Latin script and 1 is English and 2 is Singhala (says,'
this is romanized Singhala').
Text 2 is
I for one am so glad we now have Unicode.
I remember when in pre-Unicode days my then-girlfriend was writing a PhD
thesis in German about Russian linguistics. She had fonts for both
alphabets, but due to technical limitations the different letters had to
share the same code points. And at
Naena Guru, Tue, 8 Jan 2013 15:56:52 -0600:
The statement,
the death of most character sets makes everyone's systems smaller and
faster
is *FALSE*. Compare the sizes of the following two files that are copies of
a newspaper article. The top part in red has few more words in romanized
Farewell to Mark Crispin, a true friend of Unicode.
http://en.wikipedia.org/wiki/Mark_Crispin
https://www.ietf.org/mail-archive/web/imap5/current/msg00571.html
Michael Everson * http://www.evertype.com/
9 matches
Mail list logo