Re: Furigana

2002-08-14 Thread Doug Ewell
Tex Texin tex at i18nguy dot com wrote: http://www.unicode.org/unicode/uni2book/ch13.pdf As I read that material, I take it to be saying that senders should remove the I.A. characters. What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide

Re: Furigana

2002-08-14 Thread Tex Texin
The text says: except for private agreement. So if con-senting a-d-u-l-t-s want to exchange interlinear annotated text, that is fine. (I hyphenated the words because some of my previous emails were rejected by Doug's filters..) tex Doug Ewell wrote: Tex Texin tex at i18nguy dot com wrote:

Re: Digraphs as Distinct Logical Units

2002-08-14 Thread Martin Kochanski
At 09:37 14/08/02 +0430, Roozbeh Pournader wrote: And it's also a reason for why a compatiblity decomposition is needed for it. When some piece of modern software doesn't find it in an older font, it can display it as its decomposition. No, it can't. (1) Most software doesn't know what

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-14 Thread Martin Kochanski
I read, somewhere, that certain code point ranges had been allocated properties (such as LTR/RTL) in the Unicode tables even though some of them had not yet had characters defined for them. Possibly someone can penetrate the vagueness of this memory and confirm or deny? If this is the case,

NLS_LANG for russian characterset in UNIX

2002-08-14 Thread Ankur Mahajan
Any clue if i want to use RUSSIAN characterset in UNIX environment, what should i set in .profile for NLS_LANG like for american english, it is NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1 so what should be the same setting for russian charset ?? Regds,Ankur MahajanAsst. Systems Engr.Tata

Discrepancy between Names List Code Charts?

2002-08-14 Thread Kevin Brown
This is my first posting to this list so please be gentle with me! I have come across a confusing discrepancy between the official unicode description of some characters (ie the description in the Names List) and the way they are graphically displayed in the Unicode Code Charts. This appears

RE: Furigana

2002-08-14 Thread Michael Everson
At 16:35 -0700 2002-08-13, Murray Sargent wrote: Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. I'm not the kind of implementor you are. I do implement things. :-) When you implement things

Re: RE: Furigana

2002-08-14 Thread Michael Everson
At 17:59 -0700 2002-08-13, Kenneth Whistler wrote: And Microsoft has others of such beasties hiding internally as anchors for you-don't-wanna-know-what -- also not interchanged. I am ***NOT*** bashing MS here, but what is everyone saying? That these characters should be annotated in the

Re: Digraphs as Distinct Logical Units

2002-08-14 Thread Roozbeh Pournader
On Wed, 14 Aug 2002, Martin Kochanski wrote: (1) Most software doesn't know what characters exist in any particular font that the user happens to have chosen, and it doesn't want to know. This is straightforward modular software design: some part of the *operating system* is responsible for

RE: Digraphs as Distinct Logical Units

2002-08-14 Thread Marco Cimarosti
Michael Everson wrote: At 03:37 +0430 2002-08-09, Roozbeh Pournader wrote: By not providing a compatibility decomposition, we are making the proposed character a healthy and normal characters, [...] Doesn't matter where it's encoded. It is to be considered, if you will pardon the term,

Re: Tildes on vowels

2002-08-14 Thread William Overington
James Kass wrote as follows. Indeed, a program designed to display actual superscripts based on the notational form would work pretty much the same regardless of whether standard or non-standard characters are used, and the editing or input screen would also look essentially identical. Yes,

Re: Tildes on vowels

2002-08-14 Thread William Overington
Philipp Reichmuth wrote as follows. Hello, William, This is sort of lengthy once more. Forgive me and put me in your score files. :-) What please is a score file? Note that asking Microsoft to have Notepad support courtyard codes is a lot more work and a lot less likely to succeed than

Re: Tildes on vowels

2002-08-14 Thread William Overington
Marco Cimarosti wrote as follows. As you see, it is nowhere said that markup is necessarily something beginning with or any other character. The additional information (markup) can be in any format, in fact the definition says: It is expected that systems and applications will implement

RE: New version of TR29:

2002-08-14 Thread Marco Cimarosti
Mark Davis wrote: There is a new version of Unicode Technical Report #29: Text Boundaries on http://www.unicode.org/reports/tr29/, [...] Feedback that is received before the UTC meeting (starting August 20) can be made available for the discussion of TR29 at that meeting. I think that

Microsoft withdraws web fonts

2002-08-14 Thread john . colby
see http://www.microsoft.com/opentype/fontpack/default.htm Headsup from http://zeldman.com/ John

RE: Tildes on vowels

2002-08-14 Thread Marco Cimarosti
William Overington wrote: Marco Cimarosti wrote as follows. As you see, it is nowhere said that markup is necessarily something beginning with or any other character. The additional information (markup) can be in any format, in fact the definition says: It is expected that systems and

Re: Tildes on vowels

2002-08-14 Thread John Cowan
William Overington scripsit: This is sort of lengthy once more. Forgive me and put me in your score files. :-) What please is a score file? A list, actual or notional, of persons from whom you do not wish to hear. Also called a kill file. As the Unicode Technical Committee is considering

Re: New version of TR29:

2002-08-14 Thread John Cowan
Marco Cimarosti scripsit: Moreover, as Martins-Tuválkin says, non-Catalan uses of U+00B7 are too unusual and uninteresting to be taken as the default. You omit, however, its very common use as a sign of multiplication. BTW, notice that the most important of these non-Catalan usages work as

RE: Digraphs as Distinct Logical Units

2002-08-14 Thread Marco Cimarosti
John Cowan wrote: Marco Cimarosti scripsit: If this is the case, decomposing the mark into the Arabic letters it derives from would be as nonsensical as decomposing the question mark into the Latin letters it derives from (Qo for quaestio). I grant your Q but I doubt your o. In

Re: Furigana

2002-08-14 Thread John Cowan
James Kass scripsit: Once a meaning like INTERLINEAR ANNOTATION ANCHOR has been assigned to a code point, any application which chooses to use that code point for any other purpose would be at fault. But a purely nominal one, since any use of these three codepoints should be behind the

RE: NLS_LANG for russian characterset in UNIX

2002-08-14 Thread Addison Phillips [wM]
Hi Ankur, The NLS_LANG environment variable is used for configuring the Oracle database products. If you mean that you want to set up your copy of Oracle for Russian, you could use: NLS_LANG=RUSSIAN_CIS.charset where charset== one of the following: CHARACTERSET

Re: Digraphs as Distinct Logical Units

2002-08-14 Thread John Cowan
Marco Cimarosti scripsit: If this is the case, decomposing the mark into the Arabic letters it derives from would be as nonsensical as decomposing the question mark into the Latin letters it derives from (Qo for quaestio). I grant your Q but I doubt your o. In all fonts known to me, the dot

Re: Furigana

2002-08-14 Thread John Cowan
Michael Everson scripsit: Excuse me, this makes no sense whatsoever. If your company, for instance, needed INTERNAL code points to attach to higher level protocols, why did you not use the Private Use Area? Well, suppose I wanted to use a codepoint internally to a program for some

RE: New version of TR29:

2002-08-14 Thread Marco Cimarosti
John Cowan wrote: Marco Cimarosti scripsit: Moreover, as Martins-Tuválkin says, non-Catalan uses of U+00B7 are too unusual and uninteresting to be taken as the default. You omit, however, its very common use as a sign of multiplication. Actually, I don't see it very often. BTW,

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-14 Thread William Overington
U+0360 COMBINING DOUBLE TILDE U+035D COMBINING DOUBLE BREVE U+035E COMBINING DOUBLE MACRON U+035F COMBINING DOUBLE LOW LINE I also note U+0361 COMBINING DOUBLE INVERTED BREVE and U+0362 COMBINING DOUBLE RIGHTWARDS ARROW BELOW in the code chart. I wonder if someone could please clarify how

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-14 Thread John Cowan
William Overington scripsit: As first letter and second letter could be theoretically almost any other Unicode characters, would the approach be to just place all three glyphs superimposed onto the screen and hope that the visual effect is reasonable or would a font have a special glyph

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-14 Thread Doug Ewell
Martin Kochanski unicode at cardbox dot net wrote: I read, somewhere, that certain code point ranges had been allocated properties (such as LTR/RTL) in the Unicode tables even though some of them had not yet had characters defined for them. Possibly someone can penetrate the vagueness of

Re: Gutenberg's ligatures (spins off from Re: Tildes on vowels)

2002-08-14 Thread Michael Everson
At 14:39 +0100 2002-08-14, William Overington wrote: Suggestions for other ligatures and abbreviations to add into the golden ligatures collection are also welcome. I suggest you stop calling it the golden ligatures collection. This term imputes a status and nobility to it which it simply

Re: Furigana

2002-08-14 Thread Michael Everson
At 20:09 -0700 2002-08-12, Doug Ewell wrote: Everybody will welcome the new conventional, graphical-type characters and scripts that are coming with Unicode 4.0. But maybe before standardizing another COMBINING GRAPHEME JOINER or other control-type character, it would be prudent to study the

RE: New version of TR29:

2002-08-14 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: Mark Davis wrote: Feedback that is received before the UTC meeting (starting August 20) can be made available for the discussion of TR29 at that meeting. The handling of apostrophe is not satisfactory for French and Italian, as the document itself

RE: Furigana

2002-08-14 Thread Marco Cimarosti
Doug Ewell wrote: I'll have to check with Adelphia and see who or what is trying to protect me from myself. Those automatic b*llsh*ts! A few years ago I was temporarily assigned to the central national office of my previous employer. It was when the Unicode list was discussing something about

Re: Dots as far as the eye can see (formerly: Re: New version of TR29:)

2002-08-14 Thread Markus Scherer
Mark Davis wrote: Note that we have a gazillion other dots already: ... And these are just the obvious ones found with a quick search (and just for the single dots). There are probably more hiding out in little corners of scripts (it's a bit like Where's Waldo looking for them. To find

Scripts in Unicode 4.0

2002-08-14 Thread John Cowan
Someone asked what new scripts were arriving in Unicode 4.0. This list is taken from the pipeline page (http://www.unicode.org/unicode/alloc/Pipeline.html): Limbu (Kirat) Tai Le Uralic Phonetic Alphabet (part of Latin, technically) Linear B (syllabary and ideographs) Aegean Numbers Ugaritic

Re: New version of TR29:

2002-08-14 Thread Philipp Reichmuth
Hello Marco, Your definition of LatinVowel is problematic. Is Y only a vowel in French? In a word such as yeux, it certainly is a consonant. Could this lead to problems? Defining such classes has the problem that they easily appear too general. The mere name LatinVowel looks too much like this

RE: New version of TR29:

2002-08-14 Thread Marco Cimarosti
David Possin wrote: How does the y in the English word rhythm fit in here? I am not sure if it is called a vowel in English. I think it should, in this case. The y in yes is a consonant. _ Marco

Re: Scripts in Unicode 4.0

2002-08-14 Thread Radovan Garabik
On Wed, Aug 14, 2002 at 12:55:26PM -0400, John Cowan wrote: Someone asked what new scripts were arriving in Unicode 4.0. This list is taken from the pipeline page (http://www.unicode.org/unicode/alloc/Pipeline.html): what are the plans with Glagolitic script? --

Re: Scripts in Unicode 4.0

2002-08-14 Thread Michael Everson
At 19:44 +0200 2002-08-14, Radovan Garabik wrote: On Wed, Aug 14, 2002 at 12:55:26PM -0400, John Cowan wrote: Someone asked what new scripts were arriving in Unicode 4.0. This list is taken from the pipeline page (http://www.unicode.org/unicode/alloc/Pipeline.html): what are the plans

Re: Gutenberg's ligatures (spins off from Re: Tildes on vowels)

2002-08-14 Thread James Kass
Michael Everson wrote in response to William Overington, I suggest you stop calling it the golden ligatures collection. This term imputes a status and nobility to it which it simply doesn't have. Indeed, I suggest that you abandon this task and use appropriate font technology to

Re: Scripts in Unicode 4.0

2002-08-14 Thread James Kass
Proposed additions can be seen in a couple of PDF format charts provided by Asmus Freytag: BMP proposed additions: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2491.pdf non-BMP proposed additions: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2492.pdf Best regards, James Kass. - Original Message

Re: Furigana

2002-08-14 Thread Kenneth Whistler
Doug (and Michael also): What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide that, instead of just throwing the annotation characters away, I should attempt to display them directly above (and smaller than) the normal text, the way

Re: Scripts in Unicode 4.0

2002-08-14 Thread Theodore H. Smith
Someone asked what new scripts were arriving in Unicode 4.0. This list is taken from the pipeline page (http://www.unicode.org/unicode/alloc/Pipeline.html): Limbu (Kirat) Tai Le Uralic Phonetic Alphabet (part of Latin, technically) Linear B (syllabary and ideographs) Aegean Numbers

Re: Scripts in Unicode 4.0

2002-08-14 Thread Kenneth Whistler
John Hudson mused: Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA FRAPPUCCINO? Come on guys, there's enough blank spaces in that block for the entire Starbucks beverage menu, especially if you treat things like EXTRA FOAM as a combining character. Well,

Re: Scripts in Unicode 4.0

2002-08-14 Thread Patrick Andries
- Original Message - From: John Hudson [EMAIL PROTECTED] At 11:11 AM 14-08-02, James Kass wrote: BMP proposed additions: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2491.pdf Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA FRAPPUCCINO? Come on guys, there's

Re: Scripts in Unicode 4.0

2002-08-14 Thread James Kass
Patrick Andries wrote in response to John Hudson, Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA FRAPPUCCINO? Come on guys, there's enough blank spaces in that block for the entire Starbucks beverage menu, especially if you treat things like EXTRA FOAM as a

Re: Scripts in Unicode 4.0

2002-08-14 Thread John Cowan
Theodore H. Smith scripsit: Whats the point of having more Latin characters? Do they look like normal Roman characters? I think we have a few versions (3 or more?) of them, already. I thought once was enough. UPA is like IPA: it exploits certain potentials of Latin script that aren't

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread Kenneth Whistler
William Overington teased us all unmercifully with: It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular

The mystery of Edwin U+1E9A

2002-08-14 Thread John Cowan
Where does this strange beast come from? Its name is LATIN SMALL LETTER A WITH RIGHT HALF RING, and the right half ring is indeed above the a. We don't have a RIGHT HALF RING ABOVE combining mark, so it only gets a compatibility decomposition. Who would need a lower-case letter with a unique

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread James Kass
Kenneth Whistler wrote in response to William Overington, ...or to pick an extension, more or less at random, say .html The file story7.uof could thus be used with a file named story.txt so as to indicate which objects were intended to be used for three uses of U+FFFC in the file

Re: The mystery of Edwin U+1E9A

2002-08-14 Thread Kenneth Whistler
John Cowan asked: Where does this strange beast come from? Semitic transliteration practice, if I recall correctly. Its name is LATIN SMALL LETTER A WITH RIGHT HALF RING, and the right half ring is indeed above the a. We don't have a RIGHT HALF RING ABOVE combining mark, so it only gets

Re: Discrepancy between Names List Code Charts?

2002-08-14 Thread Kenneth Whistler
This is my first posting to this list so please be gentle with me! *pounces and begins to play with the little furry creature (gently)* Can someone help me with this confusion as I am unsure how I should structure these WITH CEDILLA characters in fonts I'm working on. See TUS 3.0, pp.

unicode web server

2002-08-14 Thread Sarasvati
The unicode web server is off-line for an upgrade. It will be restored to service as soon as possible. -- Sarasvati

Re: New version of TR29:

2002-08-14 Thread Philipp Reichmuth
MC Consonants [j] and [w] have the special status of semivowels in MC romance languages, which means that they often behave as vowels MC do, including in the rules for elision. One has to differentiate between phonemes and graphemes. Unicode, of course, operates on the grapheme level, and thus

Re: New version of TR29:

2002-08-14 Thread Patrick Andries
- Message d'origine - De: Philipp Reichmuth [EMAIL PROTECTED] MC Consonants [j] and [w] have the special status of semivowels in MC romance languages, which means that they often behave as vowels MC do, including in the rules for elision. One has to differentiate between phonemes

RE: Digraphs as Distinct Logical Units

2002-08-14 Thread Roozbeh Pournader
On Wed, 14 Aug 2002, Marco Cimarosti wrote: Standing its usage in text, couldn't it be considered as a punctuation mark? No, I don't agree. More like a dingbat it looks to me, as far as you don't get very philosophical. If this is the case, decomposing the mark into the Arabic letters it

Re: Dots as far as the eye can see

2002-08-14 Thread Doug Ewell
Markus Scherer markus dot scherer at jtcsv dot com wrote: Note that we have a gazillion other dots already: ... And these are just the obvious ones found with a quick search (and just for the single dots). There are probably more hiding out in little corners of scripts (it's a bit like

The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)

2002-08-14 Thread William Overington
John Cowan wrote as follows. In essence, though not formally, U+FFF9..U+FFFC are non-characters as well, and the Unicode semantics just tells what programs *may* find them useful for. Unicode 4.0 editors: it might be a good idea to emphasize the close relationship of this small repertoire with