Re: What does it mean to not be a valid string in Unicode?

2013-01-08 Thread Martin J. Dürst
On 2013/01/08 14:43, Stephan Stiller wrote: Wouldn't the clean way be to ensure valid strings (only) when they're built Of course, the earlier erroneous data gets caught, the better. The problem is that error checking is expensive, both in lines of code and in execution time (I think there

Re: What does it mean to not be a valid string in Unicode?

2013-01-08 Thread Stephan Stiller
Wouldn't the clean way be to ensure valid strings (only) when they're built Of course, the earlier erroneous data gets caught, the better. The problem is that error checking is expensive, both in lines of code and in execution time (I think there is data showing that in any real-life

Re: Q is a Roman numeral?

2013-01-08 Thread Frédéric Grosshans
Le 08/01/2013 01:26, Ben Scarborough a écrit : This isn't directly related to Unicode, but I thought this would be a good place to ask. Specifically, I'm curious about figure 14 (Gordon 1982) from WG2 N3218 [http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3218.pdf], which says: Whereas our so-called

RE: What does it mean to not be a valid string in Unicode?

2013-01-08 Thread Whistler, Ken
Sorry, but I have to disagree here. If a list of strings contains items with lone surrogates (garbage), then sorting them doesn't make the garbage go away, even if the items may be sorted in correct order according to some criterion. Well, yeah, I wasn't claiming that the principled, correct

Re: Interoperability is getting better ... What does that mean?

2013-01-08 Thread Naena Guru
Thank you for commenting and Happy New Year. CP-1252 is a perfectly legal web character set, and nobody is going to argue with you if you want to use it in legal ways. (I.e. writing Latin script in it, not Sinhala.) But . Okay, what is implied is I am doing something illegal. Define what I am

Re: Interoperability is getting better ... What does that mean?

2013-01-08 Thread Jukka K. Korpela
2013-01-08 23:56, Naena Guru wrote: May I ask if the following two are Latin script, English or Singhala? 1. This is written in English. 2. mee laþingaþa síhalayi. For me, both are Latin script and 1 is English and 2 is Singhala (says,' this is romanized Singhala'). Text 2 is

Re: Interoperability is getting better ... What does that mean?

2013-01-08 Thread Charlie Ruland
I for one am so glad we now have Unicode. I remember when in pre-Unicode days my then-girlfriend was writing a PhD thesis in German about Russian linguistics. She had fonts for both alphabets, but due to technical limitations the different letters had to share the same code points. And at

Re: Interoperability is getting better ... What does that mean?

2013-01-08 Thread Leif Halvard Silli
Naena Guru, Tue, 8 Jan 2013 15:56:52 -0600: The statement, the death of most character sets makes everyone's systems smaller and faster is *FALSE*. Compare the sizes of the following two files that are copies of a newspaper article. The top part in red has few more words in romanized

Mark Crispin (1956-2012)

2013-01-08 Thread Michael Everson
Farewell to Mark Crispin, a true friend of Unicode. http://en.wikipedia.org/wiki/Mark_Crispin https://www.ietf.org/mail-archive/web/imap5/current/msg00571.html Michael Everson * http://www.evertype.com/