On Mon, 12 August 2002, John Cowan wrote:
The following characters are not currently permitted by XML 1.1, but
would be in the quot;recommendedquot; set if they were permitted:
U+200E U+200F LEFT-TO-RIGHT/RIGHT-TO-LEFT MARK
U+202A..U+202E more bidi controls
U+203F U+2040 UNDER/CHARACTER
* Tex Texin
|
| The claims that NS and Opera did not yet fully support bidi
| undermined fixing this sooner, which is a shame. They seem to
| support bidi well enough for the purposes of these examples.
Opera does not support bidi, but the operating system thinks it does,
and switches the
in contrast to this, how do you pronounce 'жч' combination in мужчина?
Like U+0429 (щ).
so ч in плач, матч is pronounced the same as in ночь?
Yes ,the soft sign after ч does not influence the pronounciation.
btw does anyone know how is the Belorussian шч pronounced?
I believe it is pronounced
John Cowan wrote:
The following characters were explicitly permitted by XML 1.0 but are
not in the recommended 1.1 set:
[...]
U+FEFF ZWNBSP
How do parsers detect the endianness of XML files in UTF-16 (and the very
fact that they are UTF-16)?
_ Marco
Tex Texin kindly responded to my question.
Firstly, thank you for responding. I have added some comments between items
below.
William, hi
Although the specific proposal may not have been discussed, there has
been much discussion, which generalizes well and then can be
specifically applied to
Can anyone tell me whether ASCII is used to denote
every character for korean characters or not,
like ASCII denotes 1-256 integer for every
character used in english language !!!
Regds,Ankur MahajanAsst. Systems
Engr.Tata Consultancy ServicesGurgaonPh. 0124-6342944/941/542 Ext.
On Tue, 13 August 2002, Marco Cimarosti wrote:
John Cowan wrote:
gt; The following characters were explicitly permitted by XML 1.0 but are
gt; not in the quot;recommendedquot; 1.1 set:
gt;
[...]
gt; U+FEFF ZWNBSP
How do parsers detect the endianness of XML files in UTF-16 (and the
Marco Cimarosti scripsit:
How do parsers detect the endianness of XML files in UTF-16 (and the very
fact that they are UTF-16)?
In XML the BOM is recognized. This has to do with the use of ZWNBSP
within a name (element name, attribute name, etc.)
--
Deshil Holles eamus. Deshil Holles
Andrew C. West scripsit:
I assume that the BMP variation selectors refer to those at U+FE00...FE0F (and
U+E0100..E01EF when
available), but exclude :
U+0180B : MONGOLIAN FREE VARIATION SELECTOR ONE
U+0180C : MONGOLIAN FREE VARIATION SELECTOR TWO
U+0180D : MONGOLIAN FREE VARIATION SELECTOR
Andrew C. West scripsit:
I assume that U+FEFF ZWNBSP is included in this list precisely because it is now
used solely with
the semantics of a Byte Order Mark, and its original meaning as ZWNBSP is deprecated
in favour of
U+2060 WORD JOINER.
In fact WORD JOINER is excluded from the list as
Well obviously not: Unicode 3.0 has AC00 to D7A3 for Hangul Codes.
Jan
-Ursprüngliche Nachricht-
Von: Ankur Mahajan [EMAIL PROTECTED]
An: [EMAIL PROTECTED]
Gesendet: Dienstag, 13. August 2002 13:12
Betreff: whether ASCII for Korean language or not !!!
John Cowan wrote:
Marco Cimarosti scripsit:
How do parsers detect the endianness of XML files in UTF-16
(and the very
fact that they are UTF-16)?
In XML the BOM is recognized. This has to do with the use of ZWNBSP
within a name (element name, attribute name, etc.)
Sorry. I should
On Tue, 13 Aug 2002, SOS (Univ. Bonn) wrote:
-Ursprüngliche Nachricht-
Von: Ankur Mahajan [EMAIL PROTECTED]
Gesendet: Dienstag, 13. August 2002 13:12
Betreff: whether ASCII for Korean language or not !!!
It's certainly possible to make a single byte character
set(with 186=94*2
William Overington wrote:
2) Superscript, subscript, combining above, and other forms of
identifying placement of characters, are better left to
markup or other
rendering systems and file formats (and not for a vehicle
intended for
plain text.)
Why? This call for markup seems to be
Hello, William,
This is sort of lengthy once more. Forgive me and put me in your score
files. :-)
1) The need for such rendering mechanisms in plain text interchange
has not been shown.
WO Well, what are objective criteria for showing it?
Is there any application where it is *needed* to
Jungshik Shin wrote:
[...] Besides, South Korea and North Korea agreed to
devise a new ISO 2002 compliant single byte character set for Korean
script(Hangul/Chosun-gul/Jeong-eum). (I don't know what it's
for though since we now have Unicode.)
BTW, I always wondered whether North Korea
William Overington wrote in response to Tex Texin.
People might find a control picture style of glyph or several glyphs
useful in the PUA for indicating superscripting or other aspects of
text presentation. For example, if superscripting were represented
as an upwards arrow and
On Tue, 13 Aug 2002, Marco Cimarosti wrote:
Jungshik Shin wrote:
[...] Besides, South Korea and North Korea agreed to
devise a new ISO 2002 compliant single byte character set for Korean
script(Hangul/Chosun-gul/Jeong-eum). (I don't know what it's
for though since we now have
James Kass asked:
Please note that both the UTC and WG2 have approved a new set
of combining double accents:
U+035D COMBINING DOUBLE BREVE
U+035E COMBINING DOUBLE MACRON
U+035F COMBINING DOUBLE LOW LINE
snip
Now, the question is, how long will it take for the fonts and
At 02:54 -0700 2002-08-09, Andrew C. West wrote:
Not so outlandish as it may first appear. When Egyptian hieroglyphs
get encoded in Unicode, I would not be surprised to see special
characters for the cartouched names of pharaohs.
Not a chance. No current implementation does this, and no one
At 12:11 -0700 2002-08-08, Kenneth Whistler wrote:
Ah, but read the caveats carefully. The Unicode interlinear
annotation characters are *not* intended for interchange, unlike
the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially,
internal-use anchor points.
What does this mean? That
At 17:43 -0700 2002-08-08, Kenneth Whistler wrote:
I expect so, actually, given its usage. A similar (but not identical)
BISMALLAH was requested early for the Thaana script, and I expect that
the decision about that will eventually be revisited, as well.
I checked and it appears that the glyph
At 19:59 +0900 2002-08-08, Dan Kogai wrote:
On Thursday, August 8, 2002, at 04:17 , Michael Everson wrote:
Where do I start looking for information about implementing
furigana? Can you have more than one gloss attached to a word? We
are considering implementing this for Blissymbols.
What do
At 13:58 -0700 2002-08-08, Kenneth Whistler wrote:
Characters like the BISMALLAH ARRAHMAN ARRAHIM that meet obvious
local requirements and make implementation sense are acceptable
to the UTC.
Note that the Maldivians asked for the same entity for the same
reasons the Pakistanis did but it was
At 08:50 -0700 2002-08-09, Doug Ewell wrote:
In the Hebrew tradition, the name of God
(Yahweh) is written specially to avoid the appearance of blasphemy.
Mark Shoulson and Michael Everson co-wrote a draft proposal in 1998 to
encode the Tetragrammaton in Unicode:
At 03:37 +0430 2002-08-09, Roozbeh Pournader wrote:
By not providing a compatibility decomposition, we are making the proposed
character a healthy and normal characters, just like Arabic letters or
symbols. It won't be a compatibility character like Chinese and Japanese
ones, or other Arabic
As Ken says the Unicode interlinear annotation characters are for
internal use only. Specifically, their meanings can be different for
different programs. If you have your nice marked up text in memory and
want to export it for use by some program, you need to use a
higher-level protocol that
I want to be able to send a Blissymbol string with a gloss in English
or Swedish attached. Nothing to do with Japanese whatsoever.
Basically, as for all things annotational or interlineating, this
is an excellent application for markup.
--Ken
Hi Michael,
ME I want to be able to send a Blissymbol string with a gloss in
ME English or Swedish attached.
Do you need this in plain text? If I understand Blissymbols correctly,
this is just to give an explanation of the Blissymbol string, much
like giving the Pinyin pronunciation to a Han
At 14:16 -0700 2002-08-13, Kenneth Whistler wrote:
I want to be able to send a Blissymbol string with a gloss in English
or Swedish attached. Nothing to do with Japanese whatsoever.
Basically, as for all things annotational or interlineating, this
is an excellent application for markup.
At 23:50 +0200 2002-08-13, Philipp Reichmuth wrote:
Hi Michael,
ME I want to be able to send a Blissymbol string with a gloss in
ME English or Swedish attached.
Do you need this in plain text?
We are exploring what to do.
If I understand Blissymbols correctly,
this is just to give an
Michael,
At 14:16 -0700 2002-08-13, Kenneth Whistler wrote:
I want to be able to send a Blissymbol string with a gloss in English
or Swedish attached. Nothing to do with Japanese whatsoever.
Basically, as for all things annotational or interlineating, this
is an excellent application
At 16:00 -0700 2002-08-13, Kenneth Whistler wrote
The Japanese national body was very clear about this, and was opposed
to these going into the standard unless such clarifications were made,
to ensure that these were not intended for plain text interchange
of furigana (or other similar
Michael Everson (in training as a curmudgeon) harrumpfed ;-)
The Japanese national body was very clear about this, and was opposed
to these going into the standard unless such clarifications were made,
to ensure that these were not intended for plain text interchange
of furigana (or other
Michael Everson said Well then they [interlinear annotation characters]
oughtn't to have been encoded.
Michael, you aren't an implementer. When you implement things
unambiguously, you may need internal code points in your plain-text
stream to attach higher-level protocols (such as formatting
Michael, you aren't an implementer. When you implement things
unambiguously, you may need internal code points in your plain-text
stream to attach higher-level protocols (such as formatting properties)
to.
That seems to be basically what William Overington is proposing,
except these characters
Murray,
It's true implementers need some place to attach higher level
protocols, but they don't need specific points for specific
implementations of internal protocols. If they weren't good enough to be
used for exchange, then simply having some unpurposed code points
available for internal use
Ken,
http://www.unicode.org/unicode/uni2book/ch13.pdf
As I read that material, I take it to be saying that senders should
remove the I.A. characters.
Does the standard discuss anywhere filtering the characters on the
receiver side?
Clearly Murray has good justification for removing the I.A.
I made a mistake:
And, yes, L + middle dot + L is indeed used: in a smallish number of
catalan words, even if the barcelonian [normative] pronunciation
doesn't distinguish between L and L·L, though it doubles a number
of other consonants.
This should be the barcelonian [non-normative]
Tex asked:
But does the standard address their removal by receivers (or
intermediaries) , and does removing them include removing the contained
annotation?
Yes and yes. p. 326:
On input, a plain text receiver should either preserve all characters
I agree. The current thinking is that U+FFF9 - U+FFFB are have no
external meaning and shouldn't appear externally, i.e., they are
noncharacters in every way except in the spec (sigh). They can be used
for whatever an implementer wants internally. I mentioned earlier that
the RichEdit edit engine
Thanks Ken. I don't know how I missed the text on 326 when I scanned it
before I mailed.
tex
Kenneth Whistler wrote:
Tex asked:
But does the standard address their removal by receivers (or
intermediaries) , and does removing them include removing the contained
annotation?
Yes and
Anto'nio Martins-Tuva'lkin scripsit:
As for the nature of the middle dot, short of a specific code point
attributed to LATIN LETTER CATALAN MIDDLE DOT, there should be something
ensuring that this character can be treaded as a letter for all things
refering to word delimitation (smart
-BEGIN PGP SIGNED MESSAGE-
Jungshik Shin wrote:
* Hangul(한글) is used in South Korea while North Korea refers to
it as Choseongul(조선글). Recently, Korean ad-hoc group agreed
to propose to WG2 that Jeongum(정음 : 正音) be used in place of
Hangul/Choseongul in ISO 10646.
That's just as
On Tue, 13 Aug 2002, Michael Everson wrote:
Doesn't matter where it's encoded. It is to be considered, if you
will pardon the term, as a kind of dingbat, if I understand correctly.
I don't have anything against the term. Others may.
Because it isn't a logo, is used officially and
Kenneth Whistler wrote,
The interlinear annotation characters fall in a gray zone, since
they are not noncharacters, but by rights ought to have been.
Since they are standard characters though, the standard has to
provide some guidelines -- and it is simply safer, if you encounter
and
James Kass scripsit:
Should a character encoding standard ever encode a non-character?
Non-characters aren't encoded, they're reserved either for specific
purposes or for any desired purpose.
Is there such a thing as a non-character with a specific semantic
meaning?
Why not?
Can't apps
Roozbeh Pournader wrote in reply to Michael Everson,
Because it isn't a logo, is used officially and obligatorily in
government documents in at least two countries, one of which does not
normally use the Arabic script, and it isn't reasonable to expect
people to type it in
I can't
John Cowan wrote,
Non-characters aren't encoded, they're reserved either for specific
purposes or for any desired purpose.
If it's a specific purpose, it seems like it should either fall under
character or mark-up.
I can understand reserving code points for any desired purpose,
such as
49 matches
Mail list logo