On Sat, 18 Sep 2010 00:06:07 +0100
Krishna Birth krishnabi...@gmail.com wrote:
Could someone please correctly tell the codes to use on Unix operating
systems to produce the below diacritics:
A
Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm
...
I need to find this for a
On Sun, 19 Sep 2010 19:39:35 +0100
Krishna Birth krishnabi...@gmail.com wrote:
Correction:
Could 7 characters to one कey be possible?
On Sun, Sep 19, 2010 at 7:37 PM, Krishna Birth
krishnabi...@gmail.comwrote:
The diacritics are usually typed with non-diacritic letter. It
would be
On Sun, 26 Sep 2010 22:58:31 +0530
Vinodh Rajan vinodh.vin...@gmail.com wrote:
And I guessyou are trying to mix characters from two different
scripts
- Latin and Devanagari.
Nope. He is using the Generic Combining Candrabindu 0310
Which I suspect is only actively supported for use
On Wed, 29 Jun 2011 03:49:42 +
Peter Constable peter...@microsoft.com wrote:
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org]
On Behalf Of Jean-François Colson
* In the C’HWERTY layout on Linux, the digraph and trigraph had to
be replaced by six PUA characters
On Fri, 1 Jul 2011 01:57:46 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
CGJ is NOT made to create (or even hint) ligatures ; and certainly not
in this context.
Its main purpose is to indicate that a sequence of characters do
not form a collating unit. However, if one is using a 'monospace'
On Fri, 1 Jul 2011 04:22:59 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
2011/7/1 Richard Wordingham richard.wording...@ntlworld.com:
Its main purpose is to indicate that a sequence of characters do
not form a collating unit. However, if one is using a 'monospace'
font to space
On Sat, 2 Jul 2011 15:59:18 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
2011/7/1 Richard Wordingham richard.wording...@ntlworld.com:
I wonder if anyone has some statistics on the use of CGJ. Its
revised intended use was to disrupt collating sequences, but you
may be right about its
On Fri, 24 Jun 2011 18:24:01 +0530
Shriramana Sharma samj...@gmail.com wrote:
The point is that the sequence:
la, virama, candrabindu, la
is strictly speaking *the* sequence recommended *across* Indic
scripts for representation of Sanskrit clusters involving a nasal and
non-nasal
On Sun, 14 Aug 2011 19:59:30 +0530
Shriramana Sharma samj...@gmail.com wrote:
On 08/14/2011 06:02 PM, Richard Wordingham wrote:
On Fri, 24 Jun 2011 18:24:01 +0530
Shriramana Sharmasamj...@gmail.com wrote:
The point is that the sequence:
la, virama, candrabindu, la
is strictly
On Sat, 6 Aug 2011 17:25:11 -0700
tulasi tulas...@gmail.com wrote:
- Why did Unicode Inc copies some letters/symbols from Greek-script
irresponsibly and renamed as Latin-script?
- Why din't it (Unicode Inc) use same Greek letters/symbols?
U+00B5 MICRO SIGN is an ISO-8859-1 character,
On Mon, 15 Aug 2011 07:21:20 +0530
Shriramana Sharma samj...@gmail.com wrote:
On 08/15/2011 01:48 AM, Richard Wordingham wrote:
The issues is on the relative ordering of candrabindu and virama.
For a C1-conjoining form (i.e. C2 relatively unmodified),la virama
candrabindu la is easier
On Tue, 16 Aug 2011 23:32:51 +0100
Andrew West andrewcw...@gmail.com wrote:
Chris Fynn asked about certain non-standard stacks he was trying to
implement in the Tibetan Machine Uni font in an email to the Tibex
list on 2006-12-09, but these didn't involve multiple consonant-vowel
sequences
On Fri, 19 Aug 2011 22:14:17 +0700
Martin Hosken martin_hos...@sil.org wrote:
Therefore, I would suggest that a carefully allocated set of columns
for non L directionality PUA characters be encoded. This PUA doesn't
have to be big, with probably 1 column allocated per directionality.
I'm no
On Fri, 19 Aug 2011 17:03:41 -0700
Ken Whistler k...@sybase.com wrote:
O.k., so apparently we have awhile to go before we have to start
worrying about the Y2K or IPv4 problem for Unicode. Call me again in
the year 2851, and we'll still have 5 years left to design a new
scheme and plan for the
On Sun, 21 Aug 2011 00:21:28 +
Doug Ewell d...@ewellic.org wrote:
The more I think of it, the more I like the idea of reassigning the
default BC of Plane 16 to 'R'. What would the arguments against this
be?
BC of 'AL'?
Richard.
On Sun, 21 Aug 2011 01:44:02 +
Doug Ewell d...@ewellic.org wrote:
The more I think of it, the more I like the idea of reassigning the
default BC of Plane 16 to 'R'. What would the arguments against this
be?
BC of 'AL'?
Would that really be a better default? I thought the main RTL
On Sun, 21 Aug 2011 11:00:26 -0600
Doug Ewell d...@ewellic.org wrote:
I think as soon as we start talking about this many scenarios, we are
no longer talking about what the *default* bidi class of the PUA (or
some part of it) should be. Instead, we are talking about being able
to specify
On Sun, 21 Aug 2011 23:55:46 +
Doug Ewell d...@ewellic.org wrote:
What's a LANGUAGE MARK?
There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect). 'AL' and 'R' have different effects on certain characters
On Sun, 21 Aug 2011 16:37:34 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
Treating PUA characters as ON is very problematic - their display
would become context sensitive in unintended ways. No users of CJK
characters would think of using LRM characters, but if text is
inserted or viewed
On Mon, 22 Aug 2011 07:51:22 -0700
Doug Ewell d...@ewellic.org wrote:
Some PUA properties, like glyph shapes and maybe directionality, can
be stored in a font. Others, like numeric values and casing, might
not or cannot. An interchangeable format needs to be agreed upon for
the properties
On Mon, 22 Aug 2011 14:06:00 +0100 (BST)
William_J_G Overington wjgo_10...@btinternet.com wrote:
On Monday 22 August 2011, Andrew West andrewcw...@gmail.com wrote:
Can anyone think of a way to extend UTF-16 without adding new
surrogates or inventing a new general category?
Andrew
On Mon, 22 Aug 2011 20:58:23 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
The computing order of features should not then be:
- BiDi algorithm for reordering grapheme clusters
(I trust you mean the ordering of clusters relative to one another, not
the ordering within clusters.)
- font
On Tue, 23 Aug 2011 10:02:05 +0800
li bo libo@gmail.com wrote:
...But I don't know why user must take
a paragraph as a unit to determine the embedding levels. Why can't i
shape the text first and then wrapping the line, and determining the
embedding levels for characters within a line.
On Mon, 22 Aug 2011 16:18:56 -0700
Ken Whistler k...@sybase.com wrote:
How about Clause 12.5 of ISO/IEC 10646:
001B, 0025, 0040
You escape out of UTF-16 to ISO 2022, and then you can do whatever
the heck you want, including exchange and processing of complete
4-byte forms, with all the
On Wed, 24 Aug 2011 07:34:05 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
2011/8/24 Luke-Jr l...@dashjr.org:
On Tuesday, August 23, 2011 10:29:58 PM Philippe Verdy wrote:
Even the UTC could create its own PUA registry,
It won't. The best you can hope for is a list of registries.
Now
On Wed, 24 Aug 2011 08:02:42 -0700
Doug Ewell d...@ewellic.org wrote:
But some people seem to be dead serious about the need to go beyond
1.1 million code points, and are making dead-serious arguments that
we need to plan for it.
Those are two different claims. 'Never say never' is a useful
On Wed, 24 Aug 2011 08:35:48 -0700
Doug Ewell d...@ewellic.org wrote:
UAX #44, Table 13 (Bidi_Class Values) includes the following
descriptions:
R - Right_To_Left - any strong right-to-left (non-Arabic-type)
character AL - Arabic_Letter - any strong right-to-left (Arabic-type)
character
On Wed, 24 Aug 2011 12:40:54 -0700
Ken Whistler k...@sybase.com wrote:
On 8/24/2011 10:48 AM, Richard Wordingham wrote:
if, say,
code points are squandered.
Oh.
Well, in that case, the correct action is to work to ensure that code
points are not squandered.
Have there not already
On Sat, 3 Sep 2011 09:39:34 +0600
Christopher Fynn chris.f...@gmail.com wrote:
You can find quite a few non-standard stacks (those used in Tibetan
abbreviations) in the book བསྡུ་ཡིག་གསེར་གྱི་ཨ་ལོང། which is freely
available in PDF format from
On Sat, 10 Sep 2011 12:33:47 +0600
Chridtopher Fynn chris.f...@gmail.com wrote:
Characters only used for writing Assamese in the Bengali block is
similar. As long as you can type all the characters necessary for
writing your language, don't worry about names.
Actually, names sometimes
On Sat, 10 Sep 2011 22:19:27 +0200
Kent Karlsson kent.karlsso...@telia.com wrote:
Den 2011-09-10 20:58, skrev Jukka K. Korpela jkorp...@cs.tut.fi:
According to Oxford Style
Manual, one should not use the fi ligature in Turkish, as that
would obscure the distinction between normal i and
On Sat, 10 Sep 2011 23:53:34 +0200
Kent Karlsson kent.karlsso...@telia.com wrote:
IMO, a glyph (if any) for that compatibility character should look
*exactly* like an fi (after automatic ligature formation, if that
is done for fi) in the font used. So if no ligature for fi is
formed, the
On Sun, 11 Sep 2011 23:14:04 +0200
Kent Karlsson kent.karlsso...@telia.com wrote:
Den 2011-09-11 18:53, skrev Peter Constable
peter...@microsoft.com:
Hence, in a monospaced font, FB01 certainly should look different
from 0066,
0069, regardless of whether ligature glyphs are used in
This is a summary of what I have already submitted for Public Review
Issue 205 (http://www.unicode.org/review/pri205/). I am mentioning it
here in case there is something wrong with my idea.
My basic idea is that one does not a 'level direction mark'. The
desired effect can be achieved by
On Wed, 14 Sep 2011 03:31:14 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
In other words, the UTC policy about the stability of Bidi classes
should be minimally relaxed, by rewording into something like:
« The bidi class property value of any assigned code point is
IMMUTABLE (and will
to the application of Rule W7 in the
UBA do not ligate or kern with non-neutrals.
(B) Non-displaying runs embedded within other runs have no effect on
the display.
I can make the conversion tables available on request.
Second, responses to some of the suggestions/comments:
1. Richard Wordingham
On Mon, 19 Sep 2011 05:44:27 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
2011/9/19 Peter Edberg pedb...@apple.com:
snip The whole point
of LDM was to be able to create semi-structured elements such as
the example in UAX #9 section 5.6 *without* knowing in advance
the direction
On Tue, 20 Sep 2011 01:48:45 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
2011/9/20 Richard Wordingham richard.wording...@ntlworld.com:
Because it also has practical applications (for example look at the
currenct Wikimedia bug when it wants to display lists of category
names, and insert
On Sun, 18 Sep 2011 20:21:38 Peter Edberg pedb...@apple.com wrote:
On Sep 17, 2011, at 7:24 PM, Richard Wordingham wrote:
On Fri, 16 Sep 2011 18:59:47 Peter Edberg pedb...@apple.com wrote:
However, it does not handle the situation in which the date is part
of other text, and may be preceded
On Wed, 28 Sep 2011 14:47:49 +0530 (IST)
delex r del...@indiatimes.com wrote:
On 2011.09.27 22:56, delex r wrote:
I hope a proposal will come in near future to include an additional
letter 'Khya' which is as per our (Assamese)script is not considered
as a biconsonantal conjunct as in
On Thu, 29 Sep 2011 15:31:41 +0530 (IST)
delex r del...@indiatimes.com wrote:
I am a bit confused whether a computer or say a microprocessor
actually needs to know the characters as BENGALI LETTER .. for
reconstructing/reproducing/displaying .. on the screen from the
Hexadecimal codes
On Sat, 15 Oct 2011 04:37:11 +0200
Peter Cyrus pcy...@alivox.net wrote:
Ken, your explanation seems more permissive than I had anticipated.
One particularity of this script is that it is written in different
gaits, depending on the phonology of the language. Languages with
open syllables,
On Sat, 15 Oct 2011 17:19:29 +0200 (CEST)
Andreas Prilop prilop4...@trashmail.net wrote:
I return to
http://www.unicode.org/mail-arch/unicode-ml/y2011-m10/att-0059/1999-12-31.html
Microsoft programs (Internet Explorer, MS Word), display this as
31/12/1999
Other programs (Firefox,
On Sun, 16 Oct 2011 21:37:20 +0200
Peter Cyrus pcy...@alivox.net wrote:
Perhaps, awkwardly. But that is ultimately equivalent to marking the
gait on every letter, in which case I probably wouldn't need to
distinguish between initial and non-initial letters.
If you allow C(R)V(C) as a 'fixed'
On Mon, 17 Oct 2011 05:57:33 +0200
Eli Zaretskii e...@gnu.org wrote:
Date: Sun, 16 Oct 2011 22:47:08 +0100
From: Richard Wordingham richard.wording...@ntlworld.com
List-software: Ecartis version 1.0.0
HTML 4.0 and 4.0.1 Section 8.2 Paragraph 3 Section 2 states, If a
document does
On Mon, 5 Mar 2012 14:26:43 -0600 (CST)
Benjamin M Scarborough benjamin.scarboro...@utdallas.edu wrote:
Are you suggesting a LATIN SIGN VIRAMA?
The problem with LATIN SIGN COENG and LATIN SIGN INVERSE COENG is that
they are too late - there are characters around that should decompose to
contain
On Tue, 17 Apr 2012 17:40:59 -0400
Ed Trager ed.tra...@gmail.com wrote:
Please check it out and provide me feedback:
http://unifont.org/keycurry/
My quick look was done on Ubuntu 10.04 using Firefox 11.0 Canonical-1.0
with a UK keyboard, with the mapping set to GB keyboard unless
otherwise
On Mon, 23 Apr 2012 15:49:29 -0400
Ed Trager ed.tra...@gmail.com wrote:
Please note that there are some encoding questions mixed in with
observations on the application.
(Observation 3 from before)
Key Curry however needs to implement a generic solution across all
scripts for displaying
On Tue, 24 Apr 2012 01:11:15 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
(I use AltGr in the Windows (MSKLC - not the latest
technology) and X mappings, but then I lose some or all of the
'ligatures'...
Correction: The loss is just with X. Windows (MSKLC) supports AltGr
On Thu, 26 Apr 2012 22:32:09 -0700
David Starner prosfil...@gmail.com wrote:
The proposal seems trivial, except for the
minor problem of establishing sufficient use to justify encoding.
If they are to be adopted by the CLDR, the digits need to be coded
consecutively. However, the symbols for
On Fri, 27 Apr 2012 13:50:15 -0700
Ken Whistler k...@sybase.com wrote:
On 4/27/2012 10:45 AM, Richard Wordingham wrote:
If they are to be adopted by the CLDR, the digits need to be coded
consecutively.
I doubt this matters in any case, because this proposed use is for
a vigesimal system
Is it anywhere stated as policy that numbers written by a string of
decimal digits will be encoded with the most significant digit first in
storage order? I couldn't find it stated anywhere.
As positional notation only seems to have been invented and propagated
once or twice (Babylonian and
Is there any recommendation on how to write Babylonian numbers in
Unicode? I use the usual scheme of using the DISH series
for the units and the U series for the tens.
One problem with the Cuneiform Numbers and Punctuation block is that
there is no cross reference for the low numbers. However,
On Fri, 27 Apr 2012 11:21:05 -0700
Doug Ewell d...@ewellic.org wrote:
SCSU works equally well, or almost so, with any text sample where the
non-ASCII characters fit into a single block of 128 code points. For
anything other than Latin-1 you need one byte of overhead, to switch
to another
On Sat, 28 Apr 2012 18:55:00 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
I wrote:
With SCSU that avoids Unicode mode and UQU whenever possible, most
alphabetic languages work fairly well.
I meant:
With SCSU that avoids Unicode mode and SQU whenever possible, most
On Mon, 30 Apr 2012 13:46:20 +0200
Michael Probst michael.probs...@web.de wrote:
Am Samstag, den 28.04.2012, 13:18 +0100 schrieb Richard Wordingham:
Is it anywhere stated as policy that numbers written by a string of
decimal digits will be encoded with the most significant digit
first
On Mon, 30 Apr 2012 13:51:27 +0200
Michael Probst michael.probs...@web.de wrote:
Am Samstag, den 28.04.2012, 15:56 +0100 schrieb Richard Wordingham:
However, there does not appear to be anything for *CUNEIFORM NUMERIC
SIGN TWO U, for which one might expect *CUNEIFORM SIGN MAN (Borger
2003
On Mon, 30 Apr 2012 16:42:51 -0700
Ken Whistler k...@sybase.com wrote:
On 4/30/2012 3:33 PM, Richard Wordingham wrote:
One is not compelled to construct U+3039 (〹) ,twenty' from two
U+3038
(〸) ,ten', so a CUNEIFORM TWO U may well be missing.
It looks as though it is.
No, it isn't
On Tue, 8 May 2012 09:05:49 -0700
Markus Scherer markus@gmail.com wrote:
On Tue, May 8, 2012 at 5:16 AM, Wordingham, Richard (UK)
richard.wording...@mbda-systems.com wrote:
The context is a discussion of whether it is necessary in the UCA
(collation) spec to support interleaved
I am puzzled as to how an implementation can compliantly implement the
tailoring of normalisation in the UCA.
Can an implementation be said to compliantly implement the tailoring of
normalisation if nominally turning it off actually has no effect? If
it can, my puzzlement goes away.
Simply
On Tue, 15 May 2012 21:33:03 -0700
Markus Scherer markus@gmail.com wrote:
On Tue, May 15, 2012 at 4:42 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I am puzzled as to how an implementation can compliantly implement
the tailoring of normalisation in the UCA.
I think
On Wed, 16 May 2012 09:17:51 -0700
Markus Scherer markus@gmail.com wrote:
On Wed, May 16, 2012 at 1:24 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Section 5.1 of the UCA says that one may have a parametric
normalisation tailoring.
Section 5.1 is about runtime
On Wed, 16 May 2012 15:32:31 -0700
Ken Whistler k...@sybase.com wrote:
On 5/16/2012 2:54 PM, Richard Wordingham wrote:
I have been wondering if U+0078 LATIN
SMALL LETTER X should be made common script because of its use for
displaying Lao vowels, but perhaps the principle of separation
On Thu, 17 May 2012 20:41:19 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
Is it really the Latin letter x in question there, if it's use is to
be a visible placeholder to hold diacritic vowel marks ? The Latin
letter has the problem of is dual case (not found in the Lao script,
and a too
On Wed, 16 May 2012 16:03:08 -0700
Markus Scherer markus@gmail.com wrote:
The problem is a contraction x+0F72 and input text x+0F73 where the
inner 0F71 should be skipped. We can avoid this by adding a
contraction for x+0F73 (and one for the equivalent x+0F71+0F72).
On the other hand,
On Wed, 16 May 2012 21:46:17 -0700
Mark Davis ☕ m...@macchiato.com wrote:
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because
it sometimes occurs in the middle of English text.
No, it's more
On Thu, 17 May 2012 22:14:55 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
It has x just like the rest of the Basic Latin alphabet, in one of its
input modes.
Which keyboard layout are you looking at? When present, it's usually
got by pressing SHIFT and the key used for U+0EAD LAO LETTER O.
On Thu, 17 May 2012 13:39:08 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
As x = 0F71, we also need the
contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
give the pair
On Thu, 17 May 2012 22:56:51 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
Oh well... then the next time we'll discuss about including the Han
sinograms in the Latin script because we find discussions in English
about these sinograms. Then we'll start mixing all scripts together as
if they
On Thu, 17 May 2012 23:16:10 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
OK, OK So this looks like there's an 'x'-like letter in the Lao
script. But why should it be the Latin letter with all its allowed
variations, its dual case, its cursive joining, its serifs ? May be
the letter
On Thu, 17 May 2012 15:42:37 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
HOWEVER, you must *not* have the added contraction for 0F71+0F71.
If we don't have this prefix contraction, then we
On Thu, 17 May 2012 21:32:19 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 4:29 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
As I've already said, DUCET 6.1.0 omits a contraction for 0FB2+0F71,
and
so CE(0FB2, 0334, 0F71, 0F80) = CE(0FB2+0F80
On Thu, 17 May 2012 21:32:19 -0700
Markus Scherer markus@gmail.com wrote:
Ok, but assuming we didn't add 0FB2+0F71, why can't we add the
contraction 0FB2+0F81 and have the 0334 and any other non-starter be
handled via discontiguous matching?
Time for me to make a pronouncement on
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer markus@gmail.com wrote:
There is nothing that requires us to get correct results *without
normalization* for all FCD strings or any other particular input
conditions (except NFD input).
So long as you don't claim conformance to the CLDR
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer markus@gmail.com wrote:
On inspection, we think we can do better (and want to), probably by
adding overlap contractions. If we get into trouble with that, we
will think of alternatives. One is to decompose more characters even
in FCD
On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
Just in case you haven't already thought of it, one reasonable scheme
would be to decompose input if and only if searching for contractions
or the input character could *hide* the start
On Sun, 20 May 2012 16:15:24 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
CORRECTION:
For the general case, we ought to be able to express a rule such as
'ignore the countering of sof-dottedness', as in Lithuanian casing,
but I don't see any finite method of expressing
On Sun, 20 May 2012 17:05:00 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
CORRECTION to correction
I wrote
rules for soft-dotted indecomposable+0307+ccc=203
when, of course, I meant
rules for soft-dotted indecomposable+0307+ccc=230
Sorry about that.
Richard.
What are the definitions of upper and lower case for the caseFirst
tailoring for the UCA and for LDML? I can't find any obvious
definition.
My suspicion is that they are defined by assignment of the DUCET
tertiary weights, UTS#10 Issue 23 (Version 6.1.0) Section 7.2.
Although these largely
On Mon, 21 May 2012 17:43:27 -0700
Ken Whistler k...@sybase.com wrote:
For example, when caseFirst is set to
uppercase, ICU orders U+1D34 MODIFIER LETTER CAPITAL H before
U+0068 LATIN SMALL LETTER H, but anomalously order U+A7F8 MODIFIER
LETTER CAPITAL H WITH STROKE*after* U+0127 LATIN
On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
This will then work for DUCET
6.1.0, work for Danish, and work for my mischievous 0302 COMBINING
CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction.
There is a very similar rule in CLDR
On Mon, 21 May 2012 17:07:33 -0700
Markus Scherer markus@gmail.com wrote:
In principle, it's straightforward: Lowercase and uppercase follow
Unicode (UCD) case properties. We distinguish an intermediate mixed
case for titlecase characters and mixed-case contractions. I believe
we also
On Tue, 22 May 2012 08:33:43 -0700
Markus Scherer markus@gmail.com wrote:
On Tue, May 22, 2012 at 1:09 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
On Mon, 21 May 2012 17:07:33 -0700
Markus Scherer markus@gmail.com wrote:
I can dig up the ICU code
On Wed, 23 May 2012 10:35:46 -0700
Markus Scherer markus@gmail.com wrote:
On Tue, May 22, 2012 at 2:22 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I found the code that computes the case bits (2 bits for
lower/mixed/upper) for building ICU tailorings. Search
On Wed, 23 May 2012 11:07:32 +0100
Michael Everson ever...@evertype.com wrote:
On 23 May 2012, at 09:41, Szelp, A. Sz. wrote:
We can wait and see wether there's need or real basis for
disunification.
The basis for disunification is that it is a major glyph change,
making it quite
On Wed, 23 May 2012 15:50:24 -0700
Markus Scherer markus@gmail.com wrote:
On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
While we're picking on that poor routine - it looks as though it
could come unstuck with kana in the supplementary
On Wed, 23 May 2012 17:47:09 -0700
Markus Scherer markus@gmail.com wrote:
On Wed, May 23, 2012 at 5:17 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
The order of code points and contractions as listed in
FractionalUCA.txt and allkeys.txt should be the same, except
On Wed, 23 May 2012 15:50:24 -0700
Markus Scherer markus@gmail.com wrote:
On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Is there a definition of the precise
relationship between DUCET and FractionalUCA.txt, or does
FractionalUCA.txt
On Wed, 23 May 2012 17:47:09 -0700
Markus Scherer markus@gmail.com wrote:
Also, I just saw that
http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.zipcontains
allkeys_CLDR.txt which should correspond 1:1 with the
FractionalUCA*.txt in the same .zip file.
One format difference:
I'm currently reviewing the definition of the Unicode
Collation Algorithm (as opposed to just trying to comply with it),
and I came across the concept of collation grapheme clusters, defined in
UTS#18 'Unicode Regular Expressions'.
For what types of strings are they supposed to be defined? Any?
On Tue, 29 May 2012 12:52:12 -0700
Doug Ewell d...@ewellic.org wrote:
And yes, of course it's possible to stack an entire new layer on top
of the existing Windows key architecture, as Keyman does. Maybe that
is the long-term solution, but I haven't heard that MS is planning to
go that route.
On Fri, 25 May 2012 12:34:01 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I spotted two differences flicking through the end of the
differences -
Nice work! Please submit your findings via
Can someone please advise me as to the sorting of Pali as Pali in
Tibetan script. I need a prompt response rather than a complete
treatment. It is possible that I have been misunderstood what I have
been able to pull together.
What I understand is the following:
(a) The retroflex lateral
On Sat, 7 Jul 2012 20:39:34 +0100 (BST)
Satyakam Phukan sphukan2...@yahoo.co.uk wrote:
Isn't the correct way of translating 'BENGALI' in Character names
into Assamese to use the the word normally used to mean Assamese? What
problems does this approach leave?
Don't you think the Mons are
On Sat, 7 Jul 2012 17:43:41 -0500
Naena Guru naenag...@gmail.com wrote:
This is the Pali sorting order in PTS Pali. The Last letter is the
retroflex L:
a ā i ī u ū
e o
aṃ aaṃ iṃ iiṃ uṃ uuṃ
eṃ oṃ
k kh g gh ṅ
c ch j jh ñ
ṭ ṭh ḍ ḍh ṇ
t th d dh n
p ph b bh m
y r l v
s h
ḷ
On Sun, 8 Jul 2012 17:31:59 +0530
Shriramana Sharma samj...@gmail.com wrote:
And you will certainly agree that a non-native cannot immediately know
what is the significance of the Indic character names DA vs DDA (vs
DDDA or A), SSA, RRA, NNA, NNNA, LLA, LLLA and so on! :-)
On the
On Sun, 8 Jul 2012 18:44:41 +0530
Shriramana Sharma samj...@gmail.com wrote:
On Sun, Jul 8, 2012 at 6:32 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
On the contrary, doubling for (historical) retroflexion is a fairly
clear convention.
Where, please? I have never heard
Are the collation tests meant to have been updated for the change in
the draft of Step 2.1 of the collation algorithm? I haven't changed
what I believe to be a UCA 6.1.0-compliant implementation, yet my code
now passes the 6.2.0 tests for both DUCET and CLDR root. (I understand
that the error in
On Mon, 9 Jul 2012 10:39:52 +0200
Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote:
Jukka K. Korpela, Mon, 09 Jul 2012 10:04:08 +0300:
Adding new characters would be possible in principle, but hardly
realistic or useful in this case. They would not change the bulk of
existing
On Mon, 09 Jul 2012 05:20:45 +0200
Jean-François Colson j...@colson.eu wrote:
Le 09/07/12 01:29, Naena Guru a écrit :
Number of letters in Singhala is only theoretical. In the case of
Singhala orthography, the actually used number depends on the
Sanskrit vocabulary.
Do you mean there
1 - 100 of 1062 matches
Mail list logo