On Mon, 11 Feb 2013 02:45:27 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/2/10 Richard Wordingham richard.wording...@ntlworld.com:
The term pathological could aplpy to these cases where a naive
implementation may in fact break the expectations. How then can a
collator become a
On Mon, Feb 11, 2013 at 1:26 AM, Mark Davis ☕ m...@macchiato.com wrote:
Bugs or requests can be filed at http://unicode.org/cldr/trac/newticket .
The problem is overlaps between contractions and decomposition mappings.
Richard did report this last year, for example for Danish:
2013/2/7 Richard Wordingham richard.wording...@ntlworld.com:
You said, on 5 February,
A process can be FULLY conforming by preserving the canonical
equivalence and treating ALL strings that are canonically equivalent,
without having to normalize them in any recommanded form, or
performing
On Sun, 10 Feb 2013 12:21:05 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/2/7 Richard Wordingham richard.wording...@ntlworld.com:
You said, on 5 February,
A process can be FULLY conforming by preserving the canonical
equivalence and treating ALL strings that are canonically
2013/2/10 Richard Wordingham richard.wording...@ntlworld.com:
Order is a problem when one has collating elements composed of multiple
characters of different non-zero canonical combining classes. In
practice this could be solved by adding more collating elements, but
in theory the number of
On Wed, 6 Feb 2013 10:18:33 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com:
Try doing UCA collation with U+0302 COMBINING CIRCUMFLEX ACCENT,
U+0067 LATIN SMALL LETTER G being a collation element (with
arbitrary collation
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com:
On Tue, 5 Feb 2013 12:16:47 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
A process can be FULLY conforming by preserving the canonical
equivalence and treating ALL strings that are canonically equivalent,
without having to
On Wed, 6 Feb 2013 10:18:33 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com:
On Tue, 5 Feb 2013 12:16:47 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
A process can be FULLY conforming by preserving the canonical
equivalence and
On Wed, 6 Feb 2013 20:35:04 +
I richard.wording...@ntlworld.com wrote:
The UCA default weighting necessarily has many 'defective'
collation elements - every character forms a collating element!
Correction: Every non-precomposed character forms a collating element.
U+00E1 LATIN SMALL LETTER
2013/2/6 Richard Wordingham richard.wording...@ntlworld.com:
On Wed, 6 Feb 2013 20:35:04 +
I richard.wording...@ntlworld.com wrote:
The UCA default weighting necessarily has many 'defective'
collation elements - every character forms a collating element!
Correction: Every
On Mon, 4 Feb 2013 23:54:41 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/2/3 Costello, Roger L. coste...@mitre.org:
- It is easier to use a few keystrokes for combining accents than
to set up compose key sequences for all the possible composed
characters.
But MOST texts using
2013/2/5 Richard Wordingham richard.wording...@ntlworld.com:
Philippe Verdy verd...@wanadoo.fr wrote:
But if the W3C needs to update
something, it's to say that ALL forms that are canonically equivalent
should be treated equally. This means that it is to the recipient of
encoded documents to
On Tue, 5 Feb 2013 12:16:47 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
A process can be FULLY conforming by preserving the canonical
equivalence and treating ALL strings that are canonically equivalent,
without having to normalize them in any recommanded form,...
Try doing UCA collation
Hello Roger,
The conclusion to your question below is a very clear NO. The reason is
that most text is already in NFC. In fact, as I wrote a few days or
weeks ago, NFC was defined to capture what's usually around on the Web
(and in other places, too). Trying to recommend that everything be in
2013/2/3 Costello, Roger L. coste...@mitre.org:
- It is easier to use a few keystrokes for combining accents than to set up
compose key sequences for all the possible composed characters.
But MOST texts using combining diacritics are written in languages for
which there already exists standard
Hi Folks,
Thank you for your excellent responses.
Based on your responses, I now wonder why the W3C recommends NFC be used for
text exchanges over the Internet. Aside from the size advantage of NFC, there
seems to be tremendous advantages to using NFD:
- It’s easier to do searches and other
On 2013-02-02, Richard Wordingham richard.wording...@ntlworld.com wrote:
On Fri, 1 Feb 2013 23:51:34 + (GMT)
Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote:
...
But if you use a member of the Keyman family of inputs methods (I've
been using Keyman for Linux (KMFL), you can set up a
Hi Folks,
The W3C recommends [1] text sent out over the Internet be in Normalized Form C
(NFC):
This document therefore chooses NFC as the
base for Web-related early normalization.
So why would one ever generate text in decomposed form (NFD)?
Do any programming languages output text
Costello, Roger L. coste...@mitre.org writes:
[...]
So why would one ever generate text in decomposed form (NFD)?
Some authors—naïve ones perhaps—might use composing accents because it’s
easier for them to remember a handful of useful composing accents than
the much larger number of
Hi,
Do any programming languages output text in NFD? Does Java? Python? C#? Perl?
JavaScript?
It might not be an example you want, recent Mac OS X stores
the filenames in NFD-derived encoding.
http://developer.apple.com/library/mac/#qa/qa1173/_index.html
Regards,
mpsuzuki
Costello, Roger L.
suzuki toshiya, Fri, 01 Feb 2013 23:39:56 +0900:
Do any programming languages output text in NFD? Does Java? Python?
C#? Perl? JavaScript?
It might not be an example you want, recent Mac OS X stores
the filenames in NFD-derived encoding.
Note that text in NFC or NFKC can still contain combining marks: Not every
user character has a single code point, and some composites have the
Composition_Exclusion property.
http://www.unicode.org/faq/normalization.html#11
http://www.unicode.org/reports/tr15/
http://www.unicode.org/notes/tn5/
On 2013年2月1日, at 上午6:07, Costello, Roger L. coste...@mitre.org wrote:
So why would one ever generate text in decomposed form (NFD)?
The Unihan database is stored in NFD because it makes the regular expressions
used to qualify its contents much, *much* simpler. I imagine that things like
[mailto:unicode-bou...@unicode.org] On
Behalf Of Costello, Roger L.
Sent: Friday, February 01, 2013 6:07 AM
To: unicode@unicode.org
Subject: Text in composed normalized form is king, right? Does anyone
generate text in decomposed normalized form?
Hi Folks,
The W3C recommends [1] text
On 2013-02-01, Costello, Roger L. coste...@mitre.org wrote:
So why would one ever generate text in decomposed form (NFD)?
Text that I type is quite likely to be in decomposed (or at least not
composed) form, because I find it a lot easier to have a few keystrokes
for combining accents than to
On Fri, 1 Feb 2013 14:07:19 +
Costello, Roger L. coste...@mitre.org wrote:
Hi Folks,
The W3C recommends [1] text sent out over the Internet be in
Normalized Form C (NFC):
This document therefore chooses NFC as the
base for Web-related early normalization.
So why would one
On Fri, 1 Feb 2013 23:51:34 + (GMT)
Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote:
On 2013-02-01, Costello, Roger L. coste...@mitre.org wrote:
So why would one ever generate text in decomposed form (NFD)?
Text that I type is quite likely to be in decomposed (or at least not
Hi Roger,
The situation is complex. Few applications and web services bother with
normalisation, so what you get, I.e. NFC or NFD or other ... often depends
on which language you are using and what input framework you are using.
Some keyboard layouts will produce NFC output,
some keyboard
28 matches
Mail list logo