Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-11 Thread Richard Wordingham
On Mon, 11 Feb 2013 02:45:27 +0100 Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/10 Richard Wordingham richard.wording...@ntlworld.com: The term pathological could aplpy to these cases where a naive implementation may in fact break the expectations. How then can a collator become a

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-11 Thread Markus Scherer
On Mon, Feb 11, 2013 at 1:26 AM, Mark Davis ☕ m...@macchiato.com wrote: Bugs or requests can be filed at http://unicode.org/cldr/trac/newticket . The problem is overlaps between contractions and decomposition mappings. Richard did report this last year, for example for Danish:

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-10 Thread Philippe Verdy
2013/2/7 Richard Wordingham richard.wording...@ntlworld.com: You said, on 5 February, A process can be FULLY conforming by preserving the canonical equivalence and treating ALL strings that are canonically equivalent, without having to normalize them in any recommanded form, or performing

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-10 Thread Richard Wordingham
On Sun, 10 Feb 2013 12:21:05 +0100 Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/7 Richard Wordingham richard.wording...@ntlworld.com: You said, on 5 February, A process can be FULLY conforming by preserving the canonical equivalence and treating ALL strings that are canonically

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-10 Thread Philippe Verdy
2013/2/10 Richard Wordingham richard.wording...@ntlworld.com: Order is a problem when one has collating elements composed of multiple characters of different non-zero canonical combining classes. In practice this could be solved by adding more collating elements, but in theory the number of

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-07 Thread Richard Wordingham
On Wed, 6 Feb 2013 10:18:33 +0100 Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/5 Richard Wordingham richard.wording...@ntlworld.com: Try doing UCA collation with U+0302 COMBINING CIRCUMFLEX ACCENT, U+0067 LATIN SMALL LETTER G being a collation element (with arbitrary collation

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-06 Thread Philippe Verdy
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com: On Tue, 5 Feb 2013 12:16:47 +0100 Philippe Verdy verd...@wanadoo.fr wrote: A process can be FULLY conforming by preserving the canonical equivalence and treating ALL strings that are canonically equivalent, without having to

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-06 Thread Richard Wordingham
On Wed, 6 Feb 2013 10:18:33 +0100 Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/5 Richard Wordingham richard.wording...@ntlworld.com: On Tue, 5 Feb 2013 12:16:47 +0100 Philippe Verdy verd...@wanadoo.fr wrote: A process can be FULLY conforming by preserving the canonical equivalence and

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-06 Thread Richard Wordingham
On Wed, 6 Feb 2013 20:35:04 + I richard.wording...@ntlworld.com wrote: The UCA default weighting necessarily has many 'defective' collation elements - every character forms a collating element! Correction: Every non-precomposed character forms a collating element. U+00E1 LATIN SMALL LETTER

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-06 Thread Philippe Verdy
2013/2/6 Richard Wordingham richard.wording...@ntlworld.com: On Wed, 6 Feb 2013 20:35:04 + I richard.wording...@ntlworld.com wrote: The UCA default weighting necessarily has many 'defective' collation elements - every character forms a collating element! Correction: Every

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-05 Thread Richard Wordingham
On Mon, 4 Feb 2013 23:54:41 +0100 Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/3 Costello, Roger L. coste...@mitre.org: - It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters. But MOST texts using

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-05 Thread Philippe Verdy
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com: Philippe Verdy verd...@wanadoo.fr wrote: But if the W3C needs to update something, it's to say that ALL forms that are canonically equivalent should be treated equally. This means that it is to the recipient of encoded documents to

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-05 Thread Richard Wordingham
On Tue, 5 Feb 2013 12:16:47 +0100 Philippe Verdy verd...@wanadoo.fr wrote: A process can be FULLY conforming by preserving the canonical equivalence and treating ALL strings that are canonically equivalent, without having to normalize them in any recommanded form,... Try doing UCA collation

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-04 Thread Martin J. Dürst
Hello Roger, The conclusion to your question below is a very clear NO. The reason is that most text is already in NFC. In fact, as I wrote a few days or weeks ago, NFC was defined to capture what's usually around on the Web (and in other places, too). Trying to recommend that everything be in

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-04 Thread Philippe Verdy
2013/2/3 Costello, Roger L. coste...@mitre.org: - It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters. But MOST texts using combining diacritics are written in languages for which there already exists standard

RE: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-03 Thread Costello, Roger L.
Hi Folks, Thank you for your excellent responses. Based on your responses, I now wonder why the W3C recommends NFC be used for text exchanges over the Internet. Aside from the size advantage of NFC, there seems to be tremendous advantages to using NFD: - It’s easier to do searches and other

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-02 Thread Julian Bradfield
On 2013-02-02, Richard Wordingham richard.wording...@ntlworld.com wrote: On Fri, 1 Feb 2013 23:51:34 + (GMT) Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote: ... But if you use a member of the Keyman family of inputs methods (I've been using Keyman for Linux (KMFL), you can set up a

Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Costello, Roger L.
Hi Folks, The W3C recommends [1] text sent out over the Internet be in Normalized Form C (NFC): This document therefore chooses NFC as the base for Web-related early normalization. So why would one ever generate text in decomposed form (NFD)? Do any programming languages output text

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Ian Clifton
Costello, Roger L. coste...@mitre.org writes: [...] So why would one ever generate text in decomposed form (NFD)? Some authors—naïve ones perhaps—might use composing accents because it’s easier for them to remember a handful of useful composing accents than the much larger number of

Re: [unicode] Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread suzuki toshiya
Hi, Do any programming languages output text in NFD? Does Java? Python? C#? Perl? JavaScript? It might not be an example you want, recent Mac OS X stores the filenames in NFD-derived encoding. http://developer.apple.com/library/mac/#qa/qa1173/_index.html Regards, mpsuzuki Costello, Roger L.

Re: [unicode] Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Leif Halvard Silli
suzuki toshiya, Fri, 01 Feb 2013 23:39:56 +0900: Do any programming languages output text in NFD? Does Java? Python? C#? Perl? JavaScript? It might not be an example you want, recent Mac OS X stores the filenames in NFD-derived encoding.

Re: [unicode] Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Markus Scherer
Note that text in NFC or NFKC can still contain combining marks: Not every user character has a single code point, and some composites have the Composition_Exclusion property. http://www.unicode.org/faq/normalization.html#11 http://www.unicode.org/reports/tr15/ http://www.unicode.org/notes/tn5/

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread John H. Jenkins
On 2013年2月1日, at 上午6:07, Costello, Roger L. coste...@mitre.org wrote: So why would one ever generate text in decomposed form (NFD)? The Unihan database is stored in NFD because it makes the regular expressions used to qualify its contents much, *much* simpler. I imagine that things like

RE: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Phillips, Addison
[mailto:unicode-bou...@unicode.org] On Behalf Of Costello, Roger L. Sent: Friday, February 01, 2013 6:07 AM To: unicode@unicode.org Subject: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form? Hi Folks, The W3C recommends [1] text

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Julian Bradfield
On 2013-02-01, Costello, Roger L. coste...@mitre.org wrote: So why would one ever generate text in decomposed form (NFD)? Text that I type is quite likely to be in decomposed (or at least not composed) form, because I find it a lot easier to have a few keystrokes for combining accents than to

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Richard Wordingham
On Fri, 1 Feb 2013 14:07:19 + Costello, Roger L. coste...@mitre.org wrote: Hi Folks, The W3C recommends [1] text sent out over the Internet be in Normalized Form C (NFC): This document therefore chooses NFC as the base for Web-related early normalization. So why would one

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Richard Wordingham
On Fri, 1 Feb 2013 23:51:34 + (GMT) Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote: On 2013-02-01, Costello, Roger L. coste...@mitre.org wrote: So why would one ever generate text in decomposed form (NFD)? Text that I type is quite likely to be in decomposed (or at least not

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-01 Thread Andrew Cunningham
Hi Roger, The situation is complex. Few applications and web services bother with normalisation, so what you get, I.e. NFC or NFD or other ... often depends on which language you are using and what input framework you are using. Some keyboard layouts will produce NFC output, some keyboard