Re: Eleventh hour check on XML 1.1 names

2002-08-13 Thread Andrew C. West
On Mon, 12 August 2002, John Cowan wrote: The following characters are not currently permitted by XML 1.1, but would be in the quot;recommendedquot; set if they were permitted: U+200E U+200F LEFT-TO-RIGHT/RIGHT-TO-LEFT MARK U+202A..U+202E more bidi controls U+203F U+2040 UNDER/CHARACTER

Re: egyptian example -fixed

2002-08-13 Thread Lars Marius Garshol
* Tex Texin | | The claims that NS and Opera did not yet fully support bidi | undermined fixing this sooner, which is a shame. They seem to | support bidi well enough for the purposes of these examples. Opera does not support bidi, but the operating system thinks it does, and switches the

RE: [unicode] Re[2]: Pronunciation of U+0429

2002-08-13 Thread Vaintroub, Wladislav
in contrast to this, how do you pronounce 'жч' combination in мужчина? Like U+0429 (щ). so ч in плач, матч is pronounced the same as in ночь? Yes ,the soft sign after ч does not influence the pronounciation. btw does anyone know how is the Belorussian шч pronounced? I believe it is pronounced

RE: Eleventh hour check on XML 1.1 names

2002-08-13 Thread Marco Cimarosti
John Cowan wrote: The following characters were explicitly permitted by XML 1.0 but are not in the recommended 1.1 set: [...] U+FEFF ZWNBSP How do parsers detect the endianness of XML files in UTF-16 (and the very fact that they are UTF-16)? _ Marco

Re: Tildes on vowels

2002-08-13 Thread William Overington
Tex Texin kindly responded to my question. Firstly, thank you for responding. I have added some comments between items below. William, hi Although the specific proposal may not have been discussed, there has been much discussion, which generalizes well and then can be specifically applied to

whether ASCII for Korean language or not !!!

2002-08-13 Thread Ankur Mahajan
Can anyone tell me whether ASCII is used to denote every character for korean characters or not, like ASCII denotes 1-256 integer for every character used in english language !!! Regds,Ankur MahajanAsst. Systems Engr.Tata Consultancy ServicesGurgaonPh. 0124-6342944/941/542 Ext.

RE: Eleventh hour check on XML 1.1 names

2002-08-13 Thread Andrew C. West
On Tue, 13 August 2002, Marco Cimarosti wrote: John Cowan wrote: gt; The following characters were explicitly permitted by XML 1.0 but are gt; not in the quot;recommendedquot; 1.1 set: gt; [...] gt; U+FEFF ZWNBSP How do parsers detect the endianness of XML files in UTF-16 (and the

Re: Eleventh hour check on XML 1.1 names

2002-08-13 Thread John Cowan
Marco Cimarosti scripsit: How do parsers detect the endianness of XML files in UTF-16 (and the very fact that they are UTF-16)? In XML the BOM is recognized. This has to do with the use of ZWNBSP within a name (element name, attribute name, etc.) -- Deshil Holles eamus. Deshil Holles

Re: Eleventh hour check on XML 1.1 names

2002-08-13 Thread John Cowan
Andrew C. West scripsit: I assume that the BMP variation selectors refer to those at U+FE00...FE0F (and U+E0100..E01EF when available), but exclude : U+0180B : MONGOLIAN FREE VARIATION SELECTOR ONE U+0180C : MONGOLIAN FREE VARIATION SELECTOR TWO U+0180D : MONGOLIAN FREE VARIATION SELECTOR

Re: Eleventh hour check on XML 1.1 names

2002-08-13 Thread John Cowan
Andrew C. West scripsit: I assume that U+FEFF ZWNBSP is included in this list precisely because it is now used solely with the semantics of a Byte Order Mark, and its original meaning as ZWNBSP is deprecated in favour of U+2060 WORD JOINER. In fact WORD JOINER is excluded from the list as

Re: whether ASCII for Korean language or not !!!

2002-08-13 Thread SOS \(Univ. Bonn\)
Well obviously not: Unicode 3.0 has AC00 to D7A3 for Hangul Codes. Jan -Ursprüngliche Nachricht- Von: Ankur Mahajan [EMAIL PROTECTED] An: [EMAIL PROTECTED] Gesendet: Dienstag, 13. August 2002 13:12 Betreff: whether ASCII for Korean language or not !!!

RE: Eleventh hour check on XML 1.1 names

2002-08-13 Thread Marco Cimarosti
John Cowan wrote: Marco Cimarosti scripsit: How do parsers detect the endianness of XML files in UTF-16 (and the very fact that they are UTF-16)? In XML the BOM is recognized. This has to do with the use of ZWNBSP within a name (element name, attribute name, etc.) Sorry. I should

Re: whether ASCII for Korean language or not !!!

2002-08-13 Thread Jungshik Shin
On Tue, 13 Aug 2002, SOS (Univ. Bonn) wrote: -Ursprüngliche Nachricht- Von: Ankur Mahajan [EMAIL PROTECTED] Gesendet: Dienstag, 13. August 2002 13:12 Betreff: whether ASCII for Korean language or not !!! It's certainly possible to make a single byte character set(with 186=94*2

RE: Tildes on vowels

2002-08-13 Thread Marco Cimarosti
William Overington wrote: 2) Superscript, subscript, combining above, and other forms of identifying placement of characters, are better left to markup or other rendering systems and file formats (and not for a vehicle intended for plain text.) Why? This call for markup seems to be

Re: Tildes on vowels

2002-08-13 Thread Philipp Reichmuth
Hello, William, This is sort of lengthy once more. Forgive me and put me in your score files. :-) 1) The need for such rendering mechanisms in plain text interchange has not been shown. WO Well, what are objective criteria for showing it? Is there any application where it is *needed* to

RE: whether ASCII for Korean language or not !!!

2002-08-13 Thread Marco Cimarosti
Jungshik Shin wrote: [...] Besides, South Korea and North Korea agreed to devise a new ISO 2002 compliant single byte character set for Korean script(Hangul/Chosun-gul/Jeong-eum). (I don't know what it's for though since we now have Unicode.) BTW, I always wondered whether North Korea

Re: Tildes on vowels

2002-08-13 Thread James Kass
William Overington wrote in response to Tex Texin. People might find a control picture style of glyph or several glyphs useful in the PUA for indicating superscripting or other aspects of text presentation. For example, if superscripting were represented as an upwards arrow and

RE: whether ASCII for Korean language or not !!!

2002-08-13 Thread Jungshik Shin
On Tue, 13 Aug 2002, Marco Cimarosti wrote: Jungshik Shin wrote: [...] Besides, South Korea and North Korea agreed to devise a new ISO 2002 compliant single byte character set for Korean script(Hangul/Chosun-gul/Jeong-eum). (I don't know what it's for though since we now have

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-13 Thread Kenneth Whistler
James Kass asked: Please note that both the UTC and WG2 have approved a new set of combining double accents: U+035D COMBINING DOUBLE BREVE U+035E COMBINING DOUBLE MACRON U+035F COMBINING DOUBLE LOW LINE snip Now, the question is, how long will it take for the fonts and

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread Michael Everson
At 02:54 -0700 2002-08-09, Andrew C. West wrote: Not so outlandish as it may first appear. When Egyptian hieroglyphs get encoded in Unicode, I would not be surprised to see special characters for the cartouched names of pharaohs. Not a chance. No current implementation does this, and no one

Re: Furigana

2002-08-13 Thread Michael Everson
At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. What does this mean? That

Re: Compatibility and Politics (was Re: Digraphs as DistinctLogical Units)

2002-08-13 Thread Michael Everson
At 17:43 -0700 2002-08-08, Kenneth Whistler wrote: I expect so, actually, given its usage. A similar (but not identical) BISMALLAH was requested early for the Thaana script, and I expect that the decision about that will eventually be revisited, as well. I checked and it appears that the glyph

Re: Furigana

2002-08-13 Thread Michael Everson
At 19:59 +0900 2002-08-08, Dan Kogai wrote: On Thursday, August 8, 2002, at 04:17 , Michael Everson wrote: Where do I start looking for information about implementing furigana? Can you have more than one gloss attached to a word? We are considering implementing this for Blissymbols. What do

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread Michael Everson
At 13:58 -0700 2002-08-08, Kenneth Whistler wrote: Characters like the BISMALLAH ARRAHMAN ARRAHIM that meet obvious local requirements and make implementation sense are acceptable to the UTC. Note that the Maldivians asked for the same entity for the same reasons the Pakistanis did but it was

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread Michael Everson
At 08:50 -0700 2002-08-09, Doug Ewell wrote: In the Hebrew tradition, the name of God (Yahweh) is written specially to avoid the appearance of blasphemy. Mark Shoulson and Michael Everson co-wrote a draft proposal in 1998 to encode the Tetragrammaton in Unicode:

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread Michael Everson
At 03:37 +0430 2002-08-09, Roozbeh Pournader wrote: By not providing a compatibility decomposition, we are making the proposed character a healthy and normal characters, just like Arabic letters or symbols. It won't be a compatibility character like Chinese and Japanese ones, or other Arabic

RE: Furigana

2002-08-13 Thread Murray Sargent
As Ken says the Unicode interlinear annotation characters are for internal use only. Specifically, their meanings can be different for different programs. If you have your nice marked up text in memory and want to export it for use by some program, you need to use a higher-level protocol that

Re: Furigana

2002-08-13 Thread Kenneth Whistler
I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup. --Ken

Re: Furigana

2002-08-13 Thread Philipp Reichmuth
Hi Michael, ME I want to be able to send a Blissymbol string with a gloss in ME English or Swedish attached. Do you need this in plain text? If I understand Blissymbols correctly, this is just to give an explanation of the Blissymbol string, much like giving the Pinyin pronunciation to a Han

Re: Furigana

2002-08-13 Thread Michael Everson
At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup.

Re: Furigana

2002-08-13 Thread Michael Everson
At 23:50 +0200 2002-08-13, Philipp Reichmuth wrote: Hi Michael, ME I want to be able to send a Blissymbol string with a gloss in ME English or Swedish attached. Do you need this in plain text? We are exploring what to do. If I understand Blissymbols correctly, this is just to give an

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Michael, At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application

Re: Furigana

2002-08-13 Thread Michael Everson
At 16:00 -0700 2002-08-13, Kenneth Whistler wrote The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other similar

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Michael Everson (in training as a curmudgeon) harrumpfed ;-) The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other

RE: Furigana

2002-08-13 Thread Murray Sargent
Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting

Re: RE: Furigana

2002-08-13 Thread starner
Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting properties) to. That seems to be basically what William Overington is proposing, except these characters

Re: Furigana

2002-08-13 Thread Tex Texin
Murray, It's true implementers need some place to attach higher level protocols, but they don't need specific points for specific implementations of internal protocols. If they weren't good enough to be used for exchange, then simply having some unpurposed code points available for internal use

Re: Furigana

2002-08-13 Thread Tex Texin
Ken, http://www.unicode.org/unicode/uni2book/ch13.pdf As I read that material, I take it to be saying that senders should remove the I.A. characters. Does the standard discuss anywhere filtering the characters on the receiver side? Clearly Murray has good justification for removing the I.A.

Re: Is U+0140 (l with middle dot) ever used?

2002-08-13 Thread Anto'nio Martins-Tuva'lkin
I made a mistake: And, yes, L + middle dot + L is indeed used: in a smallish number of catalan words, even if the barcelonian [normative] pronunciation doesn't distinguish between L and L·L, though it doubles a number of other consonants. This should be the barcelonian [non-normative]

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Tex asked: But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? Yes and yes. p. 326: On input, a plain text receiver should either preserve all characters

RE: Furigana

2002-08-13 Thread Murray Sargent
I agree. The current thinking is that U+FFF9 - U+FFFB are have no external meaning and shouldn't appear externally, i.e., they are noncharacters in every way except in the spec (sigh). They can be used for whatever an implementer wants internally. I mentioned earlier that the RichEdit edit engine

Re: Furigana

2002-08-13 Thread Tex Texin
Thanks Ken. I don't know how I missed the text on 326 when I scanned it before I mailed. tex Kenneth Whistler wrote: Tex asked: But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? Yes and

Re: Is U+0140 (l with middle dot) ever used?

2002-08-13 Thread John Cowan
Anto'nio Martins-Tuva'lkin scripsit: As for the nature of the middle dot, short of a specific code point attributed to LATIN LETTER CATALAN MIDDLE DOT, there should be something ensuring that this character can be treaded as a letter for all things refering to word delimitation (smart

Re: whether ASCII for Korean language or not !!!

2002-08-13 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Jungshik Shin wrote: * Hangul(한글) is used in South Korea while North Korea refers to it as Choseongul(조선글). Recently, Korean ad-hoc group agreed to propose to WG2 that Jeongum(정음 : 正音) be used in place of Hangul/Choseongul in ISO 10646. That's just as

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread Roozbeh Pournader
On Tue, 13 Aug 2002, Michael Everson wrote: Doesn't matter where it's encoded. It is to be considered, if you will pardon the term, as a kind of dingbat, if I understand correctly. I don't have anything against the term. Others may. Because it isn't a logo, is used officially and

Re: Furigana

2002-08-13 Thread James Kass
Kenneth Whistler wrote, The interlinear annotation characters fall in a gray zone, since they are not noncharacters, but by rights ought to have been. Since they are standard characters though, the standard has to provide some guidelines -- and it is simply safer, if you encounter and

Re: Furigana

2002-08-13 Thread John Cowan
James Kass scripsit: Should a character encoding standard ever encode a non-character? Non-characters aren't encoded, they're reserved either for specific purposes or for any desired purpose. Is there such a thing as a non-character with a specific semantic meaning? Why not? Can't apps

Re: Digraphs as Distinct Logical Units

2002-08-13 Thread James Kass
Roozbeh Pournader wrote in reply to Michael Everson, Because it isn't a logo, is used officially and obligatorily in government documents in at least two countries, one of which does not normally use the Arabic script, and it isn't reasonable to expect people to type it in I can't

Re: Furigana

2002-08-13 Thread James Kass
John Cowan wrote, Non-characters aren't encoded, they're reserved either for specific purposes or for any desired purpose. If it's a specific purpose, it seems like it should either fall under character or mark-up. I can understand reserving code points for any desired purpose, such as