Re: Unix Codes for Diacritics

2010-09-18 Thread Richard Wordingham
On Sat, 18 Sep 2010 00:06:07 +0100 Krishna Birth krishnabi...@gmail.com wrote: Could someone please correctly tell the codes to use on Unix operating systems to produce the below diacritics: A Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm ... I need to find this for a

Re: Xmodmap Project - Please contact if interested in cooperating

2010-09-21 Thread Richard Wordingham
On Sun, 19 Sep 2010 19:39:35 +0100 Krishna Birth krishnabi...@gmail.com wrote: Correction: Could 7 characters to one कey be possible? On Sun, Sep 19, 2010 at 7:37 PM, Krishna Birth krishnabi...@gmail.comwrote: The diacritics are usually typed with non-diacritic letter. It would be

Re: Lower Case l and Upper Case L with Candrabindu

2010-09-27 Thread Richard Wordingham
On Sun, 26 Sep 2010 22:58:31 +0530 Vinodh Rajan vinodh.vin...@gmail.com wrote: And I guessyou are trying to mix characters from two different scripts - Latin and Devanagari. Nope. He is using the Generic Combining Candrabindu 0310 Which I suspect is only actively supported for use

Re: ch ligature in a monospace font

2011-06-29 Thread Richard Wordingham
On Wed, 29 Jun 2011 03:49:42 + Peter Constable peter...@microsoft.com wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Jean-François Colson * In the C’HWERTY layout on Linux, the digraph and trigraph had to be replaced by six PUA characters

Re: ch ligature in a monospace font

2011-06-30 Thread Richard Wordingham
On Fri, 1 Jul 2011 01:57:46 +0200 Philippe Verdy verd...@wanadoo.fr wrote: CGJ is NOT made to create (or even hint) ligatures ; and certainly not in this context. Its main purpose is to indicate that a sequence of characters do not form a collating unit. However, if one is using a 'monospace'

Re: ch ligature in a monospace font

2011-07-01 Thread Richard Wordingham
On Fri, 1 Jul 2011 04:22:59 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2011/7/1 Richard Wordingham richard.wording...@ntlworld.com: Its main purpose is to indicate that a sequence of characters do not form a collating unit.  However, if one is using a 'monospace' font to space

Re: ch ligature in a monospace font

2011-07-04 Thread Richard Wordingham
On Sat, 2 Jul 2011 15:59:18 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2011/7/1 Richard Wordingham richard.wording...@ntlworld.com: I wonder if anyone has some statistics on the use of CGJ.  Its revised intended use was to disrupt collating sequences, but you may be right about its

Re: Sanskrit nasalized L

2011-08-14 Thread Richard Wordingham
On Fri, 24 Jun 2011 18:24:01 +0530 Shriramana Sharma samj...@gmail.com wrote: The point is that the sequence: la, virama, candrabindu, la is strictly speaking *the* sequence recommended *across* Indic scripts for representation of Sanskrit clusters involving a nasal and non-nasal

Re: Sanskrit nasalized L

2011-08-14 Thread Richard Wordingham
On Sun, 14 Aug 2011 19:59:30 +0530 Shriramana Sharma samj...@gmail.com wrote: On 08/14/2011 06:02 PM, Richard Wordingham wrote: On Fri, 24 Jun 2011 18:24:01 +0530 Shriramana Sharmasamj...@gmail.com wrote: The point is that the sequence: la, virama, candrabindu, la is strictly

Greek Characters Duplicated as Latin (was: Sanskrit nasalized L)

2011-08-14 Thread Richard Wordingham
On Sat, 6 Aug 2011 17:25:11 -0700 tulasi tulas...@gmail.com wrote: - Why did Unicode Inc copies some letters/symbols from Greek-script irresponsibly and renamed as Latin-script? - Why din't it (Unicode Inc) use same Greek letters/symbols? U+00B5 MICRO SIGN is an ISO-8859-1 character,

Re: Sanskrit nasalized L

2011-08-15 Thread Richard Wordingham
On Mon, 15 Aug 2011 07:21:20 +0530 Shriramana Sharma samj...@gmail.com wrote: On 08/15/2011 01:48 AM, Richard Wordingham wrote: The issues is on the relative ordering of candrabindu and virama. For a C1-conjoining form (i.e. C2 relatively unmodified),la virama candrabindu la is easier

Re: Non-standard Tibetan stacks

2011-08-17 Thread Richard Wordingham
On Tue, 16 Aug 2011 23:32:51 +0100 Andrew West andrewcw...@gmail.com wrote: Chris Fynn asked about certain non-standard stacks he was trying to implement in the Tibetan Machine Uni font in an email to the Tibex list on 2006-12-09, but these didn't involve multiple consonant-vowel sequences

Re: RTL PUA?

2011-08-20 Thread Richard Wordingham
On Fri, 19 Aug 2011 22:14:17 +0700 Martin Hosken martin_hos...@sil.org wrote: Therefore, I would suggest that a carefully allocated set of columns for non L directionality PUA characters be encoded. This PUA doesn't have to be big, with probably 1 column allocated per directionality. I'm no

Re: Code pages and Unicode

2011-08-20 Thread Richard Wordingham
On Fri, 19 Aug 2011 17:03:41 -0700 Ken Whistler k...@sybase.com wrote: O.k., so apparently we have awhile to go before we have to start worrying about the Y2K or IPv4 problem for Unicode. Call me again in the year 2851, and we'll still have 5 years left to design a new scheme and plan for the

Re: RTL PUA?

2011-08-20 Thread Richard Wordingham
On Sun, 21 Aug 2011 00:21:28 + Doug Ewell d...@ewellic.org wrote: The more I think of it, the more I like the idea of reassigning the default BC of Plane 16 to 'R'. What would the arguments against this be? BC of 'AL'? Richard.

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 01:44:02 + Doug Ewell d...@ewellic.org wrote: The more I think of it, the more I like the idea of reassigning the default BC of Plane 16 to 'R'. What would the arguments against this be? BC of 'AL'? Would that really be a better default? I thought the main RTL

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 11:00:26 -0600 Doug Ewell d...@ewellic.org wrote: I think as soon as we start talking about this many scenarios, we are no longer talking about what the *default* bidi class of the PUA (or some part of it) should be. Instead, we are talking about being able to specify

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 23:55:46 + Doug Ewell d...@ewellic.org wrote: What's a LANGUAGE MARK? There are *three* strong directionalities - 'L' left-to-right, 'AL' right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I suspect). 'AL' and 'R' have different effects on certain characters

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 16:37:34 -0700 Asmus Freytag asm...@ix.netcom.com wrote: Treating PUA characters as ON is very problematic - their display would become context sensitive in unintended ways. No users of CJK characters would think of using LRM characters, but if text is inserted or viewed

Re: RTL PUA?

2011-08-22 Thread Richard Wordingham
On Mon, 22 Aug 2011 07:51:22 -0700 Doug Ewell d...@ewellic.org wrote: Some PUA properties, like glyph shapes and maybe directionality, can be stored in a font. Others, like numeric values and casing, might not or cannot. An interchangeable format needs to be agreed upon for the properties

Re: Code pages and Unicode

2011-08-22 Thread Richard Wordingham
On Mon, 22 Aug 2011 14:06:00 +0100 (BST) William_J_G Overington wjgo_10...@btinternet.com wrote: On Monday 22 August 2011, Andrew West andrewcw...@gmail.com wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Andrew

Re: RTL PUA?

2011-08-23 Thread Richard Wordingham
On Mon, 22 Aug 2011 20:58:23 +0200 Philippe Verdy verd...@wanadoo.fr wrote: The computing order of features should not then be: - BiDi algorithm for reordering grapheme clusters (I trust you mean the ordering of clusters relative to one another, not the ordering within clusters.) - font

Re: Implement BIDI algorithm by line

2011-08-23 Thread Richard Wordingham
On Tue, 23 Aug 2011 10:02:05 +0800 li bo libo@gmail.com wrote: ...But I don't know why user must take a paragraph as a unit to determine the embedding levels. Why can't i shape the text first and then wrapping the line, and determining the embedding levels for characters within a line.

Re: Code pages and Unicode

2011-08-23 Thread Richard Wordingham
On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistler k...@sybase.com wrote: How about Clause 12.5 of ISO/IEC 10646: 001B, 0025, 0040 You escape out of UTF-16 to ISO 2022, and then you can do whatever the heck you want, including exchange and processing of complete 4-byte forms, with all the

Re: Multiple private agreements

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 07:34:05 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2011/8/24 Luke-Jr l...@dashjr.org: On Tuesday, August 23, 2011 10:29:58 PM Philippe Verdy wrote: Even the UTC could create its own PUA registry, It won't. The best you can hope for is a list of registries. Now

Re: Code pages and Unicode

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 08:02:42 -0700 Doug Ewell d...@ewellic.org wrote: But some people seem to be dead serious about the need to go beyond 1.1 million code points, and are making dead-serious arguments that we need to plan for it. Those are two different claims. 'Never say never' is a useful

Re: Difference between Bidi_Class 'R' and 'AL'

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 08:35:48 -0700 Doug Ewell d...@ewellic.org wrote: UAX #44, Table 13 (Bidi_Class Values) includes the following descriptions: R - Right_To_Left - any strong right-to-left (non-Arabic-type) character AL - Arabic_Letter - any strong right-to-left (Arabic-type) character

Re: Code pages and Unicode

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 12:40:54 -0700 Ken Whistler k...@sybase.com wrote: On 8/24/2011 10:48 AM, Richard Wordingham wrote: if, say, code points are squandered. Oh. Well, in that case, the correct action is to work to ensure that code points are not squandered. Have there not already

Re: Non-standard Tibetan stacks

2011-09-03 Thread Richard Wordingham
On Sat, 3 Sep 2011 09:39:34 +0600 Christopher Fynn chris.f...@gmail.com wrote: You can find quite a few non-standard stacks (those used in Tibetan abbreviations) in the book བསྡུ་ཡིག་གསེར་གྱི་ཨ་ལོང། which is freely available in PDF format from

Re: Continue:Glaring mistake in the code list for South Asian Script

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 12:33:47 +0600 Chridtopher Fynn chris.f...@gmail.com wrote: Characters only used for writing Assamese in the Bengali block is similar. As long as you can type all the characters necessary for writing your language, don't worry about names. Actually, names sometimes

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 22:19:27 +0200 Kent Karlsson kent.karlsso...@telia.com wrote: Den 2011-09-10 20:58, skrev Jukka K. Korpela jkorp...@cs.tut.fi: According to Oxford Style Manual, one should not use the fi ligature in Turkish, as that would obscure the distinction between normal i and

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 23:53:34 +0200 Kent Karlsson kent.karlsso...@telia.com wrote: IMO, a glyph (if any) for that compatibility character should look *exactly* like an fi (after automatic ligature formation, if that is done for fi) in the font used. So if no ligature for fi is formed, the

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Richard Wordingham
On Sun, 11 Sep 2011 23:14:04 +0200 Kent Karlsson kent.karlsso...@telia.com wrote: Den 2011-09-11 18:53, skrev Peter Constable peter...@microsoft.com: Hence, in a monospaced font, FB01 certainly should look different from 0066, 0069, regardless of whether ligature glyphs are used in

Need for Level Direction Mark

2011-09-13 Thread Richard Wordingham
This is a summary of what I have already submitted for Public Review Issue 205 (http://www.unicode.org/review/pri205/). I am mentioning it here in case there is something wrong with my idea. My basic idea is that one does not a 'level direction mark'. The desired effect can be achieved by

Re: Need for Level Direction Mark

2011-09-14 Thread Richard Wordingham
On Wed, 14 Sep 2011 03:31:14 +0200 Philippe Verdy verd...@wanadoo.fr wrote: In other words, the UTC policy about the stability of Bidi classes should be minimally relaxed, by rewording into something like: « The bidi class property value of any assigned code point is IMMUTABLE (and will

Re: Need for Level Direction Mark

2011-09-17 Thread Richard Wordingham
to the application of Rule W7 in the UBA do not ligate or kern with non-neutrals. (B) Non-displaying runs embedded within other runs have no effect on the display. I can make the conversion tables available on request. Second, responses to some of the suggestions/comments: 1. Richard Wordingham

Re: Need for Level Direction Mark

2011-09-19 Thread Richard Wordingham
On Mon, 19 Sep 2011 05:44:27 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2011/9/19 Peter Edberg pedb...@apple.com: snip The whole point of LDM was to be able to create semi-structured elements such as the example in UAX #9 section 5.6 *without* knowing in advance the direction

Re: Need for Level Direction Mark

2011-09-21 Thread Richard Wordingham
On Tue, 20 Sep 2011 01:48:45 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2011/9/20 Richard Wordingham richard.wording...@ntlworld.com: Because it also has practical applications (for example look at the currenct Wikimedia bug when it wants to display lists of category names, and insert

Re: Need for Level Direction Mark

2011-09-22 Thread Richard Wordingham
On Sun, 18 Sep 2011 20:21:38 Peter Edberg pedb...@apple.com wrote: On Sep 17, 2011, at 7:24 PM, Richard Wordingham wrote: On Fri, 16 Sep 2011 18:59:47 Peter Edberg pedb...@apple.com wrote: However, it does not handle the situation in which the date is part of other text, and may be preceded

Re: Noticed improvement in the Code chart link http://www.unicode.org/charts/

2011-10-02 Thread Richard Wordingham
On Wed, 28 Sep 2011 14:47:49 +0530 (IST) delex r del...@indiatimes.com wrote: On 2011.09.27 22:56, delex r wrote: I hope a proposal will come in near future to include an additional letter 'Khya' which is as per our (Assamese)script is not considered as a biconsonantal conjunct as in

Re: Subj: Reporting error in the code chart for South Asian Scripts (Bengali)

2011-10-02 Thread Richard Wordingham
On Thu, 29 Sep 2011 15:31:41 +0530 (IST) delex r del...@indiatimes.com wrote: I am a bit confused whether a computer or say a microprocessor actually needs to know the characters as BENGALI LETTER .. for reconstructing/reproducing/displaying .. on the screen from the Hexadecimal codes

Re: definition of plain text

2011-10-16 Thread Richard Wordingham
On Sat, 15 Oct 2011 04:37:11 +0200 Peter Cyrus pcy...@alivox.net wrote: Ken, your explanation seems more permissive than I had anticipated. One particularity of this script is that it is written in different gaits, depending on the phonology of the language. Languages with open syllables,

Re: Arabic date format and Microsoft programs

2011-10-16 Thread Richard Wordingham
On Sat, 15 Oct 2011 17:19:29 +0200 (CEST) Andreas Prilop prilop4...@trashmail.net wrote: I return to http://www.unicode.org/mail-arch/unicode-ml/y2011-m10/att-0059/1999-12-31.html Microsoft programs (Internet Explorer, MS Word), display this as 31/12/1999 Other programs (Firefox,

Re: definition of plain text

2011-10-16 Thread Richard Wordingham
On Sun, 16 Oct 2011 21:37:20 +0200 Peter Cyrus pcy...@alivox.net wrote: Perhaps, awkwardly. But that is ultimately equivalent to marking the gait on every letter, in which case I probably wouldn't need to distinguish between initial and non-initial letters. If you allow C(R)V(C) as a 'fixed'

Re: Arabic date format and Microsoft programs

2011-10-17 Thread Richard Wordingham
On Mon, 17 Oct 2011 05:57:33 +0200 Eli Zaretskii e...@gnu.org wrote: Date: Sun, 16 Oct 2011 22:47:08 +0100 From: Richard Wordingham richard.wording...@ntlworld.com List-software: Ecartis version 1.0.0 HTML 4.0 and 4.0.1 Section 8.2 Paragraph 3 Section 2 states, If a document does

Re: Combining latin small letters with diacritics

2012-03-06 Thread Richard Wordingham
On Mon, 5 Mar 2012 14:26:43 -0600 (CST) Benjamin M Scarborough benjamin.scarboro...@utdallas.edu wrote: Are you suggesting a LATIN SIGN VIRAMA? The problem with LATIN SIGN COENG and LATIN SIGN INVERSE COENG is that they are too late - there are characters around that should decompose to contain

Re: Key Curry : Attempting to make it easy to type world languages and orthographies on the web

2012-04-22 Thread Richard Wordingham
On Tue, 17 Apr 2012 17:40:59 -0400 Ed Trager ed.tra...@gmail.com wrote: Please check it out and provide me feedback: http://unifont.org/keycurry/ My quick look was done on Ubuntu 10.04 using Firefox 11.0 Canonical-1.0 with a UK keyboard, with the mapping set to GB keyboard unless otherwise

Re: Key Curry : Attempting to make it easy to type world languages and orthographies on the web

2012-04-23 Thread Richard Wordingham
On Mon, 23 Apr 2012 15:49:29 -0400 Ed Trager ed.tra...@gmail.com wrote: Please note that there are some encoding questions mixed in with observations on the application. (Observation 3 from before) Key Curry however needs to implement a generic solution across all scripts for displaying

Re: Key Curry : Correction about MSKLC

2012-04-23 Thread Richard Wordingham
On Tue, 24 Apr 2012 01:11:15 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: (I use AltGr in the Windows (MSKLC - not the latest technology) and X mappings, but then I lose some or all of the 'ligatures'... Correction: The loss is just with X. Windows (MSKLC) supports AltGr

Re: Kaktovik Inupiaq numerals

2012-04-27 Thread Richard Wordingham
On Thu, 26 Apr 2012 22:32:09 -0700 David Starner prosfil...@gmail.com wrote: The proposal seems trivial, except for the minor problem of establishing sufficient use to justify encoding. If they are to be adopted by the CLDR, the digits need to be coded consecutively. However, the symbols for

Re: Kaktovik Inupiaq numerals

2012-04-28 Thread Richard Wordingham
On Fri, 27 Apr 2012 13:50:15 -0700 Ken Whistler k...@sybase.com wrote: On 4/27/2012 10:45 AM, Richard Wordingham wrote: If they are to be adopted by the CLDR, the digits need to be coded consecutively. I doubt this matters in any case, because this proposed use is for a vigesimal system

Encoding of Numbers Composed of Decimal Digits (General Category of Nd)

2012-04-28 Thread Richard Wordingham
Is it anywhere stated as policy that numbers written by a string of decimal digits will be encoded with the most significant digit first in storage order? I couldn't find it stated anywhere. As positional notation only seems to have been invented and propagated once or twice (Babylonian and

Writing Babylonian Numbers in Unicode

2012-04-28 Thread Richard Wordingham
Is there any recommendation on how to write Babylonian numbers in Unicode? I use the usual scheme of using the DISH series for the units and the U series for the tens. One problem with the Cuneiform Numbers and Punctuation block is that there is no cross reference for the low numbers. However,

Re: Unicode, SMS and year 2012

2012-04-28 Thread Richard Wordingham
On Fri, 27 Apr 2012 11:21:05 -0700 Doug Ewell d...@ewellic.org wrote: SCSU works equally well, or almost so, with any text sample where the non-ASCII characters fit into a single block of 128 code points. For anything other than Latin-1 you need one byte of overhead, to switch to another

Re: Unicode, SMS and year 2012 - SQU, not UQU

2012-04-28 Thread Richard Wordingham
On Sat, 28 Apr 2012 18:55:00 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: I wrote: With SCSU that avoids Unicode mode and UQU whenever possible, most alphabetic languages work fairly well. I meant: With SCSU that avoids Unicode mode and SQU whenever possible, most

Re: Encoding of Numbers Composed of Decimal Digits (General Category of Nd)

2012-04-30 Thread Richard Wordingham
On Mon, 30 Apr 2012 13:46:20 +0200 Michael Probst michael.probs...@web.de wrote: Am Samstag, den 28.04.2012, 13:18 +0100 schrieb Richard Wordingham: Is it anywhere stated as policy that numbers written by a string of decimal digits will be encoded with the most significant digit first

Re: Writing Babylonian Numbers in Unicode

2012-04-30 Thread Richard Wordingham
On Mon, 30 Apr 2012 13:51:27 +0200 Michael Probst michael.probs...@web.de wrote: Am Samstag, den 28.04.2012, 15:56 +0100 schrieb Richard Wordingham: However, there does not appear to be anything for *CUNEIFORM NUMERIC SIGN TWO U, for which one might expect *CUNEIFORM SIGN MAN (Borger 2003

Re: Writing Babylonian Numbers in Unicode

2012-05-01 Thread Richard Wordingham
On Mon, 30 Apr 2012 16:42:51 -0700 Ken Whistler k...@sybase.com wrote: On 4/30/2012 3:33 PM, Richard Wordingham wrote: One is not compelled to construct U+3039 (〹) ,twenty' from two U+3038 (〸) ,ten', so a CUNEIFORM TWO U may well be missing. It looks as though it is. No, it isn't

Re: Possible 'Normal' Nested Contractions for Collation

2012-05-08 Thread Richard Wordingham
On Tue, 8 May 2012 09:05:49 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 8, 2012 at 5:16 AM, Wordingham, Richard (UK) richard.wording...@mbda-systems.com wrote: The context is a discussion of whether it is necessary in the UCA (collation) spec to support interleaved

Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-15 Thread Richard Wordingham
I am puzzled as to how an implementation can compliantly implement the tailoring of normalisation in the UCA. Can an implementation be said to compliantly implement the tailoring of normalisation if nominally turning it off actually has no effect? If it can, my puzzlement goes away. Simply

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-16 Thread Richard Wordingham
On Tue, 15 May 2012 21:33:03 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 15, 2012 at 4:42 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I am puzzled as to how an implementation can compliantly implement the tailoring of normalisation in the UCA. I think

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-16 Thread Richard Wordingham
On Wed, 16 May 2012 09:17:51 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 16, 2012 at 1:24 AM, Richard Wordingham richard.wording...@ntlworld.com wrote: Section 5.1 of the UCA says that one may have a parametric normalisation tailoring. Section 5.1 is about runtime

Mark-Driven Script Categorisation

2012-05-16 Thread Richard Wordingham
On Wed, 16 May 2012 15:32:31 -0700 Ken Whistler k...@sybase.com wrote: On 5/16/2012 2:54 PM, Richard Wordingham wrote: I have been wondering if U+0078 LATIN SMALL LETTER X should be made common script because of its use for displaying Lao vowels, but perhaps the principle of separation

Re: Mark-Driven Script Categorisation

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 20:41:19 +0200 Philippe Verdy verd...@wanadoo.fr wrote: Is it really the Latin letter x in question there, if it's use is to be a visible placeholder to hold diacritic vowel marks ? The Latin letter has the problem of is dual case (not found in the Lao script, and a too

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-17 Thread Richard Wordingham
On Wed, 16 May 2012 16:03:08 -0700 Markus Scherer markus@gmail.com wrote: The problem is a contraction x+0F72 and input text x+0F73 where the inner 0F71 should be skipped. We can avoid this by adding a contraction for x+0F73 (and one for the equivalent x+0F71+0F72). On the other hand,

Mark-Driven Script Categorisation (was: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm)

2012-05-17 Thread Richard Wordingham
On Wed, 16 May 2012 21:46:17 -0700 Mark Davis ☕ m...@macchiato.com wrote: No, it's not. Including x in Lao for some pedagogical (I'm guessing) purpose is completely out of scope. That'd be like including π in Latin because it sometimes occurs in the middle of English text. No, it's more

Re: Mark-Driven Script Categorisation

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 22:14:55 +0200 Philippe Verdy verd...@wanadoo.fr wrote: It has x just like the rest of the Basic Latin alphabet, in one of its input modes. Which keyboard layout are you looking at? When present, it's usually got by pressing SHIFT and the key used for U+0EAD LAO LETTER O.

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 13:39:08 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: As x = 0F71, we also need the contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to give the pair

Re: Mark-Driven Script Categorisation

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 22:56:51 +0200 Philippe Verdy verd...@wanadoo.fr wrote: Oh well... then the next time we'll discuss about including the Han sinograms in the Latin script because we find discussions in English about these sinograms. Then we'll start mixing all scripts together as if they

Re: Mark-Driven Script Categorisation

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 23:16:10 +0200 Philippe Verdy verd...@wanadoo.fr wrote: OK, OK So this looks like there's an 'x'-like letter in the Lao script. But why should it be the Latin letter with all its allowed variations, its dual case, its cursive joining, its serifs ? May be the letter

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-17 Thread Richard Wordingham
On Thu, 17 May 2012 15:42:37 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: HOWEVER, you must *not* have the added contraction for 0F71+0F71. If we don't have this prefix contraction, then we

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Thu, 17 May 2012 21:32:19 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 17, 2012 at 4:29 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: As I've already said, DUCET 6.1.0 omits a contraction for 0FB2+0F71, and so CE(0FB2, 0334, 0F71, 0F80) = CE(0FB2+0F80

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Thu, 17 May 2012 21:32:19 -0700 Markus Scherer markus@gmail.com wrote: Ok, but assuming we didn't add 0FB2+0F71, why can't we add the contraction 0FB2+0F81 and have the 0334 and any other non-starter be handled via discontiguous matching? Time for me to make a pronouncement on

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Fri, 18 May 2012 09:51:34 -0700 Markus Scherer markus@gmail.com wrote: There is nothing that requires us to get correct results *without normalization* for all FCD strings or any other particular input conditions (except NFD input). So long as you don't claim conformance to the CLDR

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Fri, 18 May 2012 09:51:34 -0700 Markus Scherer markus@gmail.com wrote: On inspection, we think we can do better (and want to), probably by adding overlap contractions. If we get into trouble with that, we will think of alternatives. One is to decompose more characters even in FCD

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-20 Thread Richard Wordingham
On Sat, 19 May 2012 01:12:17 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: Just in case you haven't already thought of it, one reasonable scheme would be to decompose input if and only if searching for contractions or the input character could *hide* the start

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-20 Thread Richard Wordingham
On Sun, 20 May 2012 16:15:24 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: CORRECTION: For the general case, we ought to be able to express a rule such as 'ignore the countering of sof-dottedness', as in Lithuanian casing, but I don't see any finite method of expressing

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-20 Thread Richard Wordingham
On Sun, 20 May 2012 17:05:00 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: CORRECTION to correction I wrote rules for soft-dotted indecomposable+0307+ccc=203 when, of course, I meant rules for soft-dotted indecomposable+0307+ccc=230 Sorry about that. Richard.

CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Richard Wordingham
What are the definitions of upper and lower case for the caseFirst tailoring for the UCA and for LDML? I can't find any obvious definition. My suspicion is that they are defined by assignment of the DUCET tertiary weights, UTS#10 Issue 23 (Version 6.1.0) Section 7.2. Although these largely

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Richard Wordingham
On Mon, 21 May 2012 17:43:27 -0700 Ken Whistler k...@sybase.com wrote: For example, when caseFirst is set to uppercase, ICU orders U+1D34 MODIFIER LETTER CAPITAL H before U+0068 LATIN SMALL LETTER H, but anomalously order U+A7F8 MODIFIER LETTER CAPITAL H WITH STROKE*after* U+0127 LATIN

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-21 Thread Richard Wordingham
On Sat, 19 May 2012 01:12:17 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: This will then work for DUCET 6.1.0, work for Danish, and work for my mischievous 0302 COMBINING CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction. There is a very similar rule in CLDR

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-22 Thread Richard Wordingham
On Mon, 21 May 2012 17:07:33 -0700 Markus Scherer markus@gmail.com wrote: In principle, it's straightforward: Lowercase and uppercase follow Unicode (UCD) case properties. We distinguish an intermediate mixed case for titlecase characters and mixed-case contractions. I believe we also

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-22 Thread Richard Wordingham
On Tue, 22 May 2012 08:33:43 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 22, 2012 at 1:09 AM, Richard Wordingham richard.wording...@ntlworld.com wrote: On Mon, 21 May 2012 17:07:33 -0700 Markus Scherer markus@gmail.com wrote: I can dig up the ICU code

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 10:35:46 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 22, 2012 at 2:22 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I found the code that computes the case bits (2 bits for lower/mixed/upper) for building ICU tailorings. Search

Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 11:07:32 +0100 Michael Everson ever...@evertype.com wrote: On 23 May 2012, at 09:41, Szelp, A. Sz. wrote: We can wait and see wether there's need or real basis for disunification. The basis for disunification is that it is a major glyph change, making it quite

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 15:50:24 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: While we're picking on that poor routine - it looks as though it could come unstuck with kana in the supplementary

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 17:47:09 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 5:17 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: The order of code points and contractions as listed in FractionalUCA.txt and allkeys.txt should be the same, except

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 15:50:24 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: Is there a definition of the precise relationship between DUCET and FractionalUCA.txt, or does FractionalUCA.txt

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-24 Thread Richard Wordingham
On Wed, 23 May 2012 17:47:09 -0700 Markus Scherer markus@gmail.com wrote: Also, I just saw that http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.zipcontains allkeys_CLDR.txt which should correspond 1:1 with the FractionalUCA*.txt in the same .zip file. One format difference:

Discontiguous Collation Grapheme Clusters

2012-05-27 Thread Richard Wordingham
I'm currently reviewing the definition of the Unicode Collation Algorithm (as opposed to just trying to comply with it), and I came across the concept of collation grapheme clusters, defined in UTS#18 'Unicode Regular Expressions'. For what types of strings are they supposed to be defined? Any?

Re: [OT] Re: Exact positioning of Indian Rupee symbol according to Unicode Technical Committee

2012-05-30 Thread Richard Wordingham
On Tue, 29 May 2012 12:52:12 -0700 Doug Ewell d...@ewellic.org wrote: And yes, of course it's possible to stack an entire new layer on top of the existing Windows key architecture, as Keyman does. Maybe that is the long-term solution, but I haven't heard that MS is planning to go that route.

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-07-04 Thread Richard Wordingham
On Fri, 25 May 2012 12:34:01 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I spotted two differences flicking through the end of the differences - Nice work! Please submit your findings via

Sorting Pali in Tibetan Script

2012-07-07 Thread Richard Wordingham
Can someone please advise me as to the sorting of Pali as Pali in Tibetan script. I need a prompt response rather than a complete treatment. It is possible that I have been misunderstood what I have been able to pull together. What I understand is the following: (a) The retroflex lateral

Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

2012-07-07 Thread Richard Wordingham
On Sat, 7 Jul 2012 20:39:34 +0100 (BST) Satyakam Phukan sphukan2...@yahoo.co.uk wrote: Isn't the correct way of translating 'BENGALI' in Character names into Assamese to use the the word normally used to mean Assamese? What problems does this approach leave? Don't you think the Mons are

Re: Sorting Pali in Tibetan Script

2012-07-07 Thread Richard Wordingham
On Sat, 7 Jul 2012 17:43:41 -0500 Naena Guru naenag...@gmail.com wrote: This is the Pali sorting order in PTS Pali. The Last letter is the retroflex L: a ā i ī u ū e o aṃ aaṃ iṃ iiṃ uṃ uuṃ eṃ oṃ k kh g gh ṅ c ch j jh ñ ṭ ṭh ḍ ḍh ṇ t th d dh n p ph b bh m y r l v s h ḷ

Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

2012-07-08 Thread Richard Wordingham
On Sun, 8 Jul 2012 17:31:59 +0530 Shriramana Sharma samj...@gmail.com wrote: And you will certainly agree that a non-native cannot immediately know what is the significance of the Indic character names DA vs DDA (vs DDDA or A), SSA, RRA, NNA, NNNA, LLA, LLLA and so on! :-) On the

Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

2012-07-08 Thread Richard Wordingham
On Sun, 8 Jul 2012 18:44:41 +0530 Shriramana Sharma samj...@gmail.com wrote: On Sun, Jul 8, 2012 at 6:32 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: On the contrary, doubling for (historical) retroflexion is a fairly clear convention. Where, please? I have never heard

Unicode 6.2.0 Beta Collation Tests

2012-07-08 Thread Richard Wordingham
Are the collation tests meant to have been updated for the change in the draft of Step 2.1 of the collation algorithm? I haven't changed what I believe to be a UCA 6.1.0-compliant implementation, yet my code now passes the 6.2.0 tests for both DUCET and CLDR root. (I understand that the error in

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-09 Thread Richard Wordingham
On Mon, 9 Jul 2012 10:39:52 +0200 Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: Jukka K. Korpela, Mon, 09 Jul 2012 10:04:08 +0300: Adding new characters would be possible in principle, but hardly realistic or useful in this case. They would not change the bulk of existing

Re: Romanized Singhala - Think about it again

2012-07-10 Thread Richard Wordingham
On Mon, 09 Jul 2012 05:20:45 +0200 Jean-François Colson j...@colson.eu wrote: Le 09/07/12 01:29, Naena Guru a écrit : Number of letters in Singhala is only theoretical. In the case of Singhala orthography, the actually used number depends on the Sanskrit vocabulary. Do you mean there

  1   2   3   4   5   6   7   8   9   10   >