Tex Texin tex at i18nguy dot com wrote:
http://www.unicode.org/unicode/uni2book/ch13.pdf
As I read that material, I take it to be saying that senders should
remove the I.A. characters.
What if I *want* to design an annotation-aware rendering mechanism?
Suppose I read Section 13.6 and decide
The text says: except for private agreement.
So if con-senting a-d-u-l-t-s want to exchange interlinear annotated
text, that is fine.
(I hyphenated the words because some of my previous emails were rejected
by Doug's filters..)
tex
Doug Ewell wrote:
Tex Texin tex at i18nguy dot com wrote:
At 09:37 14/08/02 +0430, Roozbeh Pournader wrote:
And it's also a reason for why a compatiblity decomposition is needed for
it. When some piece of modern software doesn't find it in an older font,
it can display it as its decomposition.
No, it can't.
(1) Most software doesn't know what
I read, somewhere, that certain code point ranges had been allocated properties (such
as LTR/RTL) in the Unicode tables even though some of them had not yet had characters
defined for them. Possibly someone can penetrate the vagueness of this memory and
confirm or deny?
If this is the case,
Any clue if i want to use RUSSIAN
characterset in UNIX environment, what should i set in .profile for
NLS_LANG
like for american english,
it is NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
so what should be the same setting for russian
charset ??
Regds,Ankur MahajanAsst. Systems
Engr.Tata
This is my first posting to this list so please be gentle with me!
I have come across a confusing discrepancy between the official unicode
description of some characters (ie the description in the Names List) and
the way they are graphically displayed in the Unicode Code Charts.
This appears
At 16:35 -0700 2002-08-13, Murray Sargent wrote:
Michael Everson said Well then they [interlinear annotation characters]
oughtn't to have been encoded.
Michael, you aren't an implementer.
I'm not the kind of implementor you are. I do implement things. :-)
When you implement things
At 17:59 -0700 2002-08-13, Kenneth Whistler wrote:
And Microsoft has others of such beasties hiding internally as
anchors for you-don't-wanna-know-what -- also not interchanged.
I am ***NOT*** bashing MS here, but what is everyone saying? That
these characters should be annotated in the
On Wed, 14 Aug 2002, Martin Kochanski wrote:
(1) Most software doesn't know what characters exist in any particular
font that the user happens to have chosen, and it doesn't want to know.
This is straightforward modular software design: some part of the
*operating system* is responsible for
Michael Everson wrote:
At 03:37 +0430 2002-08-09, Roozbeh Pournader wrote:
By not providing a compatibility decomposition, we are
making the proposed character a healthy and normal
characters, [...]
Doesn't matter where it's encoded. It is to be considered, if you
will pardon the term,
James Kass wrote as follows.
Indeed, a program designed to display actual superscripts based on
the notational form would work pretty much the same regardless
of whether standard or non-standard characters are used, and the
editing or input screen would also look essentially identical.
Yes,
Philipp Reichmuth wrote as follows.
Hello, William,
This is sort of lengthy once more. Forgive me and put me in your score
files. :-)
What please is a score file?
Note that asking Microsoft to have Notepad support courtyard codes is
a lot more work and a lot less likely to succeed than
Marco Cimarosti wrote as follows.
As you see, it is nowhere said that markup is necessarily something
beginning with or any other character. The additional information
(markup) can be in any format, in fact the definition says: It is
expected that systems and applications will implement
Mark Davis wrote:
There is a new version of Unicode Technical Report #29: Text
Boundaries on http://www.unicode.org/reports/tr29/,
[...]
Feedback that is received before the UTC meeting (starting
August 20) can be
made available for the discussion of TR29 at that meeting.
I think that
see http://www.microsoft.com/opentype/fontpack/default.htm
Headsup from http://zeldman.com/
John
William Overington wrote:
Marco Cimarosti wrote as follows.
As you see, it is nowhere said that markup is necessarily something
beginning with or any other character. The additional information
(markup) can be in any format, in fact the definition says: It is
expected that systems and
William Overington scripsit:
This is sort of lengthy once more. Forgive me and put me in your score
files. :-)
What please is a score file?
A list, actual or notional, of persons from whom you do not wish to hear.
Also called a kill file.
As the Unicode Technical Committee is considering
Marco Cimarosti scripsit:
Moreover, as Martins-Tuválkin says, non-Catalan uses of U+00B7 are too
unusual and uninteresting to be taken as the default.
You omit, however, its very common use as a sign of multiplication.
BTW, notice that the most important of these non-Catalan usages work as
John Cowan wrote:
Marco Cimarosti scripsit:
If this is the case, decomposing the mark into the Arabic
letters it derives
from would be as nonsensical as decomposing the question
mark into the Latin
letters it derives from (Qo for quaestio).
I grant your Q but I doubt your o. In
James Kass scripsit:
Once a meaning like
INTERLINEAR ANNOTATION ANCHOR has been assigned to
a code point, any application which chooses to use that code
point for any other purpose would be at fault.
But a purely nominal one, since any use of these three codepoints
should be behind the
Hi
Ankur,
The
NLS_LANG environment variable is used for configuring the Oracle database
products. If you mean that you want to set up your copy of Oracle for Russian,
you could use:
NLS_LANG=RUSSIAN_CIS.charset
where
charset== one of the following:
CHARACTERSET
Marco Cimarosti scripsit:
If this is the case, decomposing the mark into the Arabic letters it derives
from would be as nonsensical as decomposing the question mark into the Latin
letters it derives from (Qo for quaestio).
I grant your Q but I doubt your o. In all fonts known to me, the
dot
Michael Everson scripsit:
Excuse me, this makes no sense whatsoever. If your company, for
instance, needed INTERNAL code points to attach to higher level
protocols, why did you not use the Private Use Area?
Well, suppose I wanted to use a codepoint internally to a program for
some
John Cowan wrote:
Marco Cimarosti scripsit:
Moreover, as Martins-Tuválkin says, non-Catalan uses of
U+00B7 are too
unusual and uninteresting to be taken as the default.
You omit, however, its very common use as a sign of multiplication.
Actually, I don't see it very often.
BTW,
U+0360 COMBINING DOUBLE TILDE
U+035D COMBINING DOUBLE BREVE
U+035E COMBINING DOUBLE MACRON
U+035F COMBINING DOUBLE LOW LINE
I also note U+0361 COMBINING DOUBLE INVERTED BREVE and U+0362 COMBINING
DOUBLE RIGHTWARDS ARROW BELOW in the code chart.
I wonder if someone could please clarify how
William Overington scripsit:
As first letter and second letter could be theoretically almost any other
Unicode characters, would the approach be to just place all three glyphs
superimposed onto the screen and hope that the visual effect is reasonable
or would a font have a special glyph
Martin Kochanski unicode at cardbox dot net wrote:
I read, somewhere, that certain code point ranges had been allocated
properties (such as LTR/RTL) in the Unicode tables even though some
of them had not yet had characters defined for them. Possibly someone
can penetrate the vagueness of
At 14:39 +0100 2002-08-14, William Overington wrote:
Suggestions for other ligatures and abbreviations to add into the
golden ligatures collection are also welcome.
I suggest you stop calling it the golden ligatures collection. This
term imputes a status and nobility to it which it simply
At 20:09 -0700 2002-08-12, Doug Ewell wrote:
Everybody will welcome the new conventional, graphical-type
characters and scripts that are coming with Unicode 4.0. But maybe
before standardizing another COMBINING GRAPHEME JOINER or other
control-type character, it would be prudent to study the
I (Marco Cimarosti) wrote:
Mark Davis wrote:
Feedback that is received before the UTC meeting (starting
August 20) can be made available for the discussion of
TR29 at that meeting.
The handling of apostrophe is not satisfactory for French and
Italian, as the document itself
Doug Ewell wrote:
I'll have to check with Adelphia and see who or what is trying to
protect me from myself.
Those automatic b*llsh*ts!
A few years ago I was temporarily assigned to the central national office of
my previous employer. It was when the Unicode list was discussing something
about
Mark Davis wrote:
Note that we have a gazillion other dots already:
...
And these are just the obvious ones found with a quick search (and just
for the single dots). There are probably more hiding out in little
corners of scripts (it's a bit like Where's Waldo looking for them.
To find
Someone asked what new scripts were arriving in Unicode 4.0.
This list is taken from the pipeline page
(http://www.unicode.org/unicode/alloc/Pipeline.html):
Limbu (Kirat)
Tai Le
Uralic Phonetic Alphabet (part of Latin, technically)
Linear B (syllabary and ideographs)
Aegean Numbers
Ugaritic
Hello Marco,
Your definition of LatinVowel is problematic. Is Y only a vowel in
French? In a word such as yeux, it certainly is a consonant. Could
this lead to problems?
Defining such classes has the problem that they easily appear too
general. The mere name LatinVowel looks too much like this
David Possin wrote:
How does the y in the English word rhythm fit in here? I
am not sure if it is called a vowel in English.
I think it should, in this case. The y in yes is a consonant.
_ Marco
On Wed, Aug 14, 2002 at 12:55:26PM -0400, John Cowan wrote:
Someone asked what new scripts were arriving in Unicode 4.0.
This list is taken from the pipeline page
(http://www.unicode.org/unicode/alloc/Pipeline.html):
what are the plans with Glagolitic script?
--
At 19:44 +0200 2002-08-14, Radovan Garabik wrote:
On Wed, Aug 14, 2002 at 12:55:26PM -0400, John Cowan wrote:
Someone asked what new scripts were arriving in Unicode 4.0.
This list is taken from the pipeline page
(http://www.unicode.org/unicode/alloc/Pipeline.html):
what are the plans
Michael Everson wrote in response to William Overington,
I suggest you stop calling it the golden ligatures collection. This
term imputes a status and nobility to it which it simply doesn't
have. Indeed, I suggest that you abandon this task and use
appropriate font technology to
Proposed additions can be seen in a couple of PDF format charts
provided by Asmus Freytag:
BMP proposed additions:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2491.pdf
non-BMP proposed additions:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2492.pdf
Best regards,
James Kass.
- Original Message
Doug (and Michael also):
What if I *want* to design an annotation-aware rendering mechanism?
Suppose I read Section 13.6 and decide that, instead of just throwing
the annotation characters away, I should attempt to display them
directly above (and smaller than) the normal text, the way
Someone asked what new scripts were arriving in Unicode 4.0.
This list is taken from the pipeline page
(http://www.unicode.org/unicode/alloc/Pipeline.html):
Limbu (Kirat)
Tai Le
Uralic Phonetic Alphabet (part of Latin, technically)
Linear B (syllabary and ideographs)
Aegean Numbers
John Hudson mused:
Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA
FRAPPUCCINO? Come on guys, there's enough blank spaces in that block for
the entire Starbucks beverage menu, especially if you treat things like
EXTRA FOAM as a combining character.
Well,
- Original Message -
From: John Hudson [EMAIL PROTECTED]
At 11:11 AM 14-08-02, James Kass wrote:
BMP proposed additions:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2491.pdf
Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA
FRAPPUCCINO? Come on guys, there's
Patrick Andries wrote in response to John Hudson,
Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA
FRAPPUCCINO? Come on guys, there's enough blank spaces in that block for
the entire Starbucks beverage menu, especially if you treat things like
EXTRA FOAM as a
Theodore H. Smith scripsit:
Whats the point of having more Latin characters? Do they look
like normal Roman characters? I think we have a few versions (3
or more?) of them, already. I thought once was enough.
UPA is like IPA: it exploits certain potentials of Latin script that
aren't
William Overington teased us all unmercifully with:
It occurs to me that it is possible to introduce a convention, either as a
matter included in the Unicode specification, or as just a known about
thing, that if one has a plain text Unicode file with a file name that has
some particular
Where does this strange beast come from? Its name is LATIN SMALL LETTER
A WITH RIGHT HALF RING, and the right half ring is indeed above the a.
We don't have a RIGHT HALF RING ABOVE combining mark, so it only gets a
compatibility decomposition.
Who would need a lower-case letter with a unique
Kenneth Whistler wrote in response to William Overington,
...or to pick an extension, more or less at random, say .html
The file story7.uof could thus be used with a file named story.txt so as to
indicate which objects were intended to be used for three uses of U+FFFC in
the file
John Cowan asked:
Where does this strange beast come from?
Semitic transliteration practice, if I recall correctly.
Its name is LATIN SMALL LETTER
A WITH RIGHT HALF RING, and the right half ring is indeed above the a.
We don't have a RIGHT HALF RING ABOVE combining mark, so it only gets
This is my first posting to this list so please be gentle with me!
*pounces and begins to play with the little furry creature (gently)*
Can someone help me with this confusion as I am unsure how I should
structure these WITH CEDILLA characters in fonts I'm working on.
See TUS 3.0, pp.
The unicode web server is off-line for an upgrade. It will be
restored to service as soon as possible.
-- Sarasvati
MC Consonants [j] and [w] have the special status of semivowels in
MC romance languages, which means that they often behave as vowels
MC do, including in the rules for elision.
One has to differentiate between phonemes and graphemes. Unicode, of
course, operates on the grapheme level, and thus
- Message d'origine -
De: Philipp Reichmuth [EMAIL PROTECTED]
MC Consonants [j] and [w] have the special status of semivowels in
MC romance languages, which means that they often behave as vowels
MC do, including in the rules for elision.
One has to differentiate between phonemes
On Wed, 14 Aug 2002, Marco Cimarosti wrote:
Standing its usage in text, couldn't it be considered as a punctuation mark?
No, I don't agree. More like a dingbat it looks to me, as far as you don't
get very philosophical.
If this is the case, decomposing the mark into the Arabic letters it
Markus Scherer markus dot scherer at jtcsv dot com wrote:
Note that we have a gazillion other dots already:
...
And these are just the obvious ones found with a quick search (and
just for the single dots). There are probably more hiding out in
little corners of scripts (it's a bit like
John Cowan wrote as follows.
In essence, though not formally, U+FFF9..U+FFFC are non-characters as
well, and the Unicode semantics just tells what programs *may* find them
useful for. Unicode 4.0 editors: it might be a good idea to emphasize
the close relationship of this small repertoire with
56 matches
Mail list logo