From: John Cowan [EMAIL PROTECTED]
Peter Kirk scripsit:
On 13/08/2003 11:09, Philippe Verdy wrote:
... For this reason, defective
combining sequences (combining characters without a leading base
character) should be forbidden (invalid for XML).
If there is even the remotest
- Original Message -
From: Jon Hanna [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 14, 2003 1:49 PM
Subject: RE: Questions on ZWNBS - for line initial holam plus alef
I do agree: a XML document could require the use at some place of a
given attribute or element. If
OK, it's safe, but it is a misuse of Unicode. As space plus combining
character is a unit in Unicode, it should be treated as a unit by higher
level protocols. If higher level protocols are allowed to do arbitrary
things within Unicode units, there is no end to the possible confusion.
See for
At 23:35 +0200 2003-08-05, Pim Blokland wrote:
I have absolutely no idea what you are talking about.
You are lucky not having to put up with bad English like five euro
and six cent, living in the Netherlands and speaking Dutch as you
do. See http://www.evertype.com/standards/euro if you wish to
Anyway, John J, what code are we talking about that has to
work from
the positions of the combining marks back to the underlying
representation? Are you talking about OCR?
No, the issue is more how to start from a base form and work
forward to
encompass the whole series of
From: Peter Kirk [EMAIL PROTECTED]
I note that there is no line break opportunity in space, NBSP. But
is
there one after the space in space, RLM, NBSP? If so, RLM, NBSP,
combining character has a third advantage, that it gives the right
line
break opportunity when this sequence is word
On 14/08/2003 09:54, Michael Everson wrote:
Lepton in Greek was accepted from the beginning.
Leptó pl leptá.
The same word as the original widow's mite (Mark 12:42). Probably worth
even less now!
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/
On 2003.06.12, 18:38, Philippe Verdy [EMAIL PROTECTED] wrote:
Capital letters simply don't use ascents or descents, and thus they
occupy a *smaller* space than the lowercase letters.
Some upper case letters commonly (i.e. in some typical fonts) have
descents, especially, though not only, in
Peter Kirk suggested...
Interesting and a little embarrassing that Unicode's own documentation
is not Unicode compatible!
I don't think it's very embarrassing... The Unicode consortium after all
doesn't produce book editing and typesetting software, we use other
peoples' software.
I think
- Message d'origine -
De: Marco Cimarosti [EMAIL PROTECTED]
Anto'nio Martins-Tuva'lkin wrote:
After all the euro is a common currency and its figures should be
written in a common way.
Why?
Very good question. Multilingual countries like Belgium or Canada already
were or are
John Cowan asked:
I realize that existing compatibility decompositions are a rag-bag,
especially those marked with the generic compat tag rather than one
of the specific tags such as font, initial, or super. I wonder
what principles, if any, can be enunciated for giving a newly introduced
-Original Message-
From: John Cowan [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 14, 2003 10:20 AM
To: Magda Danish (Unicode)
Cc: Unicode Core List; [EMAIL PROTECTED]
Subject: Re: Pre-orders of The Unicode Standard, Version 4.0
Thanks. Is the Unicode Consortium in any way
Peter, in XML you really don't want to use attributes for any general
text; there are too many restrictions on the content. For example, we
never put translatable text into them. Attributes should really be
treated more like sequences of symbols, with a constrained syntax.
This is also not in
From: Peter Kirk [EMAIL PROTECTED]
There is some potential for real trouble here, if one process outputs
an
NMTOKEN starting with a combining character preceded by a separating
space, or something else which is changed into a space, and another
process takes the new space plus combining
On 11/08/2003 17:37, Kenneth Whistler wrote:
Well, I've been promising that good things would come
to those who wait. ;-)
At last, the Unicode website has been updated with the
online chapters for Unicode 4.0. See:
http://www.unicode.org/versions/Unicode4.0.0/
Or just go to the Unicode 4.0 link
Magda Danish (Unicode) scripsit:
To order, please use the the book order form at
http://www.unicode.org/book/bookform.html
Thanks. Is the Unicode Consortium in any way benefited (or disadvantaged)
if non-members order through it rather than through Amazon or BN?
--
John Cowan [EMAIL
At 13:12 -0700 2003-08-07, Peter Kirk wrote:
Well, it seems to me that in the case of the Aramaic proposal we
don't even have that. We have an archaic version of the script which
is now used mainly for Hebrew, and which many scholars still call
Aramaic (in distinction from paleo-Hebrew)
The beta period for Unicode 4.0.1 has now started. Detailed information is
available on the beta page:
http://www.unicode.org/versions/beta.html
Beta versions of Unicode 4.0.1 data files are now available for public
comment here:
http://www.unicode.org/Public/4.0-Update1/
Elaine,
I really, really, really don't have time to debug your
dissatisfaction with the use of the word Aramaic in the Roadmaps.
This is NOT something anyone is working actively on right now. When a
proposal comes forth, there will be evidence in it that can be picked
at.
In actuality, one
Peter responded to Mark:
On 05/08/2003 14:40, Mark Davis wrote:
Where did you get the notion that space is not a base character? And
base characters include those that are not control or format
characters. Space is neither one.
The standard specifically states in a number of places that
the
solution with
SPACE is really tricky due to the special treatment of SPACE notably
in HTML, SGML, XML
I disagree. There are a few different things that happen with whitespace in
such technologies. Some of these only apply to elements that do not allow
any character data apart from
Peter Kirk peter dot r dot kirk at ntlworld dot com wrote:
Point taken. But when different fonts and rendering engines give
different results because the standard is unclear or ambiguous, that
is a matter for the discussion here. And when conforming fonts and
rendering engines fail to give
there is no such thing as NFD decompositions.
Sorry for the confusion. Still even with a NFKD decomposition,
And there is no such thing as NFKD decomposition either.
It goes as follows, in steps:
1. Canonical and compatibility decomposition mappings (one-step),
and canonical classes.
According to the docs at
http://www.microsoft.com/typography/otfntdev/indicot/other.htm,
uniscribe renders combining marks in isolation when they are
applied to SPACE + ZWJ. (Without the ZWJ, it uses a dotted
circle.) Perhaps this is an acceptable solution to the
people calling for a new
Ah, now you're making assumptions about me which are not, in fact, valid.
I'm not quite sure exactly what you mean by the text, but I own a copy of
The Unicode Standard Version 3.0 and have read it pretty much in entirety.
I have also read almost everything I could find on the unicode.org web
On 07/08/2003 13:57, John Cowan wrote:
Kent Karlsson scripsit:
4) Encode the vowel signs as combining characters, after
the base characters they logical follow. Consider them as
double [width] combining characters, that happen to
have no ink above/below the character they apply to,
Jay Chandru scripsit:
I wanted to know the differences between AL32UTF8 and UTF8. My database (oracle)
will be in AL32UTF8 format. Will the applications that require multibyte characters
work as they are functionin in UTF8 format.
The Oracle UTF8 format is really CESU-8, whereas the
At 01:18 +0200 2003-08-09, Philippe Verdy wrote:
Such break in a middle of a multiple width diacritic exist in some
notations, and are not considered horrible typography. Just look
at musical notations where a upper horizontal parenthesis
is used to group some elements [...]
Music setting is
on 2003-08-06 15:24 Doug Ewell wrote:
I'm not a typographer (intelligent or otherwise), but I'm having a tough
time seeing how Section 2.10 *requires* fonts and rendering engines to
give a space-plus-combining-diacritic combination the exact minimum
width of the diacritic alone, or to leave equal
REGISTER THIS WEEK AND SAVE
ON
EARLY-BIRD CONFERENCE AND HOTEL RATES!
Are you falling behind? Version 4.0 of the Unicode Standard is here!
Software and Web applications can now support more languages with
greater
On Sunday, August 10, 2003 9:30 AM, Mark Davis [EMAIL PROTECTED] wrote:
As for oe-ligature, the
French representative to WG3 (or its predecessor) said that France
could live without it.
Even worse; the story I heard was that the committee had planned from
the start to have and in
Jon Hanna scripsit:
If this is not the case (I'm not entirely sure this bans what XML does with
spaces) then all we would need is a change so that rather than a de facto
ban on space+combining within names and nmtokens we would have an explicit
ban on the same; then we'd all be happy, except
On 05/08/2003 14:40, Mark Davis wrote:
Where did you get the notion that space is not a base character? And
base characters include those that are not control or format
characters. Space is neither one.
The standard specifically states in a number of places that to exhibit
a combining mark in
At 01:30 +0200 2003-08-10, Philippe Verdy wrote:
Whateer you think, the SPACE+diacritic is still a hack, and
certainly not a canonical equivalent (including for its properties),
of the existing spacing diacritics, which also do not fit all usages
because they are symbols.
It is the formally
what code are we talking about that has to work from the
positions of the combining marks back to the underlying representation?
Such code is not just common and widespread, it is practically ubiquitous.
The principle of base characters always coming first are used:
Whenever you need to
Aren't the replies about Unicode 3.2 (or maybe 4.0) rather than 3.1?
1651 - Supplimentary Plane 2 - \2e80 - \u2f00
Plane 2 covers U+2 to U+2, and is not in the BMP (= Plane 0).
/kent k
On 10/08/2003 18:44, Doug Ewell wrote:
Has it occurred to anyone yet that the very *concept* of spacing
diacritics is a hack? Spacing diacritics are used to conduct a sort of
meta-discussion about characters, as in A base character o is combined
with an acute accent to create . They are not
On 08/08/2003 09:54, Jim Allan wrote:
...
It certainly makes sense that in the case of space characters that
have a defined width that this width is innate to the definition of
the character and in such a case should take precidence over the width
of the normally non-spacing combining
My congratulations to Ken, Julie, and Eric! For those who might not know,
this trio (especially Eric with the online bit) get our unadulterated love
and appreciation...Lots of difficulties on the road to online Unicode 4.0
:-) !!
Lisa
- Forwarded by Lisa Moore/Santa Teresa/IBM on
Elaine,
I disagree with you.
Just because Semitic languages *can* be represented in the Hebrew
script does not mean that every script is just a font variant of the
Hebrew script.
There are genetic relationships of the development of the scripts
which are involved in our analysis so far.
James H. Cloos Jr. wrote:
Anto'nio == Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED] writes:
Anto'nio (Let alone the validity of things
Anto'nio like k, c etc.)
I'm sure things like m, k, M and even G will come into use,
though I expect more will use them in front of the digits.
I certainly use
John Cowan asked:
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is stored, with a few minor
From: Jon Hanna [EMAIL PROTECTED]
I was saying that it wouldn't be sensible to begin a line with a
combining diacritic, since that combining diacritic would be combining
with a newline character which it's difficult to think of any possible
sensible meaning for.
A newline is a control with
Ted Hopp asked:
I believe that reasonable people might reasonably conclude from factoids 1
and 2 that SPACE is indeed a format character.
Reasonable, but evidently wrong. Explanation, please?
I provided the text deconstruction in my last email, but to
continue, the confusion arises from the
At 00:52 +0100 2003-08-14, Anto'nio Martins-Tuva'lkin wrote:
Using the cent sign is mostly US specific and the symbol is not
recognized as such in most European countries, so the cent sign is
bound directly to the dollar.
If the dollar sign can be used for currencies other than the USD,
Ken Whistler posted:
Of course a standard which mandates space folding is also
within its rights to mandate, for example, the non-use of
nonspacing marks applied to SPACE characters. It can simply
rule out such sequences as valid for its context, in which
case the problem goes away.
And for such
Peter Kirk wrote:
I think this may be a Peter mistake. I meant to refer to spacing
diacritics. Sorry.
It is certainly highly inappropriate for spacing diacritics to
be considered word boundaries.
Why? It is entirely dependent on the orthography and conventions
involved. There is probably
On 06/08/2003 14:04, John Jenkins wrote:
Speaking purely as an old fart, I'd say the former. We already break
the latter principle in Thai and Lao, and having be prepared to scan
either forward or backward from a base character in order to find its
combining marks would add overhead to a lot
We need an official Unicode Lint.
Jony
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Thursday, August 07, 2003 4:28 PM
To: [EMAIL PROTECTED]
Subject: SPAM: Re: Questions on ZWNBS - for line initial
holam plus alef
On
On 08/08/2003 08:54, Philippe Verdy wrote:
... Could there be another codepoint assigned that has
these properties:
20CF;ZERO WIDTH SYMBOL;Sk;0;ON;compat 0020N;
i.e. being considered symbolic, not a whitespace, with
combining class 0 (not combining), and used as an
explicit base for a
Indeed, pardon my haste, that was a matter of an addition to the Syriac
script. For a comparison of the various scripts used for Sogdian,
http://iranianlanguages.com/midiranian/sogdian.htm#Alphabet
Raymond
- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL
On 08/08/2003 13:56, Thomas M. Widmann wrote:
Peter Kirk [EMAIL PROTECTED] writes:
On 08/08/2003 08:54, Philippe Verdy wrote:
... Could there be another codepoint assigned that has
these properties:
20CF;ZERO WIDTH SYMBOL;Sk;0;ON;compat 0020N;
[...]
But I'm not sure
Elain Keown responded to Michael:
I really, really, really don't have time to debug your
dissatisfaction with the use of the word Aramaic in the Roadmaps.
This is NOT something anyone is working actively on right now. When a
I'm not writing about nomenclature---not the point all.
I'm
Mark Davis scripsit:
I repeat again. Nothing on this list has any guarantee that it will be
seen by anyone in the UTC. If you want to submit a FAQ question that's
great -- and I strongly encourage it. But please use:
http://www.unicode.org/reporting.html to make sure it is tracked.
Hearing
A note for those interested in how CGJ may be used in font lookups:
In the current MS implementation (Office 2002, Wordpad, etc.) if CGJ is
inserted immediately after a space character it breaks RTL directionality.
So for the time being at least, any use of CGJ to affect rendering in
Biblical
Madison
Hi,
Only two people asked me what else exists
in the complete Hebrew character set, but
maybe others care.
The significant points here are that there are
other pointing systems to be combined with base
letters and that there are manuscripts that have
TWO pointing systems
On 05/08/2003 09:42, Jim Allan wrote:
Peter Kirk posted:
If I want to do this, should I explicitly encode a dotted circle, or
should I encode nothing and expect the font to generate the dotted
circle, as it often does?
I think that practise of a font or application automaticaly inserting
a
Elaine Keown
still in Madison
Dear John Cowan and Peter Kirk:
Could you possibly explain to me why these
other organizations---IETF and W3-- are
apparently concerned about character properties,
to the point where apparently they also have
a hand in deciding what will happen
There are omissions in Michael
Everson's chart in
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf
The chart was based on Semitic languages, although
purporting to be about scripts. After all Greek and Latin also derive
from the same family of scripts, as we all learn from page 1 of
3) In attribute values that have a declared type other than
CDATA, multiple
spaces are compressed to a single space, and leading and
trailing spaces
are removed. After this is done, there can be no spaces in attributes
of type ID, IDREF, ENTITY, NMTOKEN, NOTATION, or enumerated
[EMAIL PROTECTED] scripsit:
Could you possibly explain to me why these
other organizations---IETF and W3-- are
apparently concerned about character properties,
to the point where apparently they also have
a hand in deciding what will happen with
Hebrew?
For a long time, I thought that
The NFD decompositions of spacing marks is alredy defined as a SPACE
plus a non-spacing combining character.
Philippe, please! Those are *compatibility* decompositions. The normal
form NFD only uses *canonical* decompositions. And there is no such
thing as NFD decompositions.
/kent
Greetings,
We are using Oracle9i with application tier as 11i.
I wanted to know the differences between AL32UTF8 and UTF8. My database (oracle) will be in AL32UTF8 format. Will the applications that require multibyte characters work as they are functionin in UTF8 format.
Would be great if
On 06/08/2003 03:38, Kent Karlsson wrote:
Kenneth Whistler wrote:
Kent Karlsson said:
I see no particular *technical* problem with using WJ, though. In
contrast
to the suggestion of using CGJ (re. another problem)
anywhere else but
at the end of a combining sequence. CGJ
On Saturday, August 09, 2003 12:49 AM, Michael Everson [EMAIL PROTECTED] wrote:
At 14:22 -0700 2003-08-08, Kenneth Whistler wrote:
Philippe, you are tilting at windmills, here. There is no chance
that the UTC is going to consider such a character, in my
assessment, let alone give it the
On Friday, August 08, 2003 9:16 PM, Peter Kirk [EMAIL PROTECTED] wrote:
On 07/08/2003 13:57, John Cowan wrote:
... But an immediate problem comes to mind: what if there is a
line break between the two base characters?
What if there is a line break between the two characters joined by a
Michael Everson schreef:
More
horrifying is the idiotic euro is immune to grammar error which
continues to be broadcast daily by our television and radio
stations,
all because people with power lacked the moral courage to say
oops,
yeah, that was the wrong interpretation of the Directive
At 18:49 +0200 2003-08-08, Chris Jacobs wrote:
This seems to be a clear difference from colorful scripts, where I think
there is an agreement about which glyph represents which sound.
So I think the analogy between pigpen and colorful scripts does not hold.
Two gifs on two websites does not
At 05:27 PM 8/8/2003, Kenneth Whistler wrote:
Because the mechanism for doing so -- application to SPACE or
to NBSP -- has been specified by the standard for a decade now.
True enough, but I'm also a bit concerned about this mechanism because
white space characters are another pesky thing that
Ken's point of course is that however bizarre the backing store for
Sindarin and English Tengwar modes may be, combining characters per
se must follow their base characters no matter what.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On Thursday, August 07, 2003 11:29 PM, Michael Everson [EMAIL PROTECTED] wrote:
Ken's point of course is that however bizarre the backing store for
Sindarin and English Tengwar modes may be, combining characters per
se must follow their base characters no matter what.
Even if that breaks the
From: John Cowan [EMAIL PROTECTED]
Peter Kirk scripsit:
So far so good, but when I get to an accent with no predefined spacing
variant, I have a problem!
No you don't. If you want to say Seagull is the diacritic used to
represent linguolabial sounds in the IPA, then you just encode
On Wednesday, August 06, 2003 11:48 PM, Peter Kirk [EMAIL PROTECTED] wrote:
OK, what kind of markup should I use, in any well-known markup
language, to ensure that an isolated diacritic is centred in the
space between the words before and after it?
In plain text, I think that this encoding:
I would like to point out that with all due respect, how particular fonts or rendering
engines behave is only marginally relevant to the Unicode list. I think that we should
deal only with the Unicode specification.
A particular implementation or many implementations may not behave as expected,
At 08:55 -0700 2003-08-05, Doug Ewell wrote:
The original legislative attempt to dictate the exact proportions (and
even color) of the euro sign, regardless of the font in use, was just
silly.
That is very old history, as detailed on my website
Isn't the very notion of submit[ting] a FAQ question a contradiction in
terms? Surely, one merely ASKS a question. If enough people ask the same
question, we may then classify it as frequently asked.
It's like this. Newbies want to find things out. So they read books, and
look around on the web.
As far as I know, there are many topics not covered by ISO, for example
(Bbi-directional behavior.
(B
(BJony
(B
(B -Original Message-
(B From: [EMAIL PROTECTED]
(B [mailto:[EMAIL PROTECTED] On Behalf Of souravm
(B Sent: Tuesday, August 12, 2003 8:40 AM
(B To: unicode
(B Subject:
Collation isn't really based on combining sequences (even though UTS
10
specifies a certain spanning over non-blocking (combining)
This is a very ignorant question: where in your public documentation
are these issues discussed?
...
I still don't understand even what happens with basic
On 13/08/2003 15:54, Jony Rosenne wrote:
Suggested but not accepted.
I am inherently suspicious when pressure is being exerted to decide complex
and difficult questions in a hurry.
Jony
Jony, I am not trying to hurry anything. I am putting a lot of time and
effort into trying to reach proper
Philippe Verdy verdy_p at wanadoo dot fr wrote:
Note that these two ZW and SP classes of characters are *normative*.
Another proof that SPACE+diacritics is really a hack causing lots of
problems in the Unicode main standard and its standard annexes.
Has it occurred to anyone yet that the very
I might be able to help. Two questions:
1. How firmly have you tracked down the point at which this conversion
happens?
2. What is the datatype in the database? (text BLOB?, ntext BLOB? varchar?)
Michael wrote:
The Name Police reject this utterly. ZERO WIDTH cannot have an
expanding dynamic width.
Then what about ZERO WIDTH SPACE, which, according to TUS3, p. 238,
can grow to have a visible width when justified? And it has the
NamesList comment:
* nominally zero width, but may
On 11/08/2003 16:06, Mark Davis wrote:
Some of this seems to be in reference to an earlier contention that
Text Boundaries (inc. Lines) break between the space and the
non-spacing mark. I think this was attributed to Phillipe.
[This may not be true: I don't actually read his email, because the
A new Unicode Technical Note on Deterministic Sorting is now available:
http://www.unicode.org/notes/tn9/
Unicode Technical Notes provide for the publication of information that
may be of interest to implementers or readers of the Unicode Standard, or
to users of programs which
Elaine Keown
still in Madison WISC
Hello,
Responding again to the deep interest in Aramaic expressed
on the list, I am writing with a suggested preliminary
Alternative or possibly Countercultural version of the
Roadmap and a New, Improved Acronym for EUSAS (Egyptian,
I think we will keep the Roadmap as it is for the time being.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
From: Kenneth Whistler [EMAIL PROTECTED]
It is perfectly reasonable, as I see it, to consider the
SPACE in a SPACE, NSM sequence to be:
a. significant
b. part of the characters in a document that are not markup
(at least in the cases we are talking about, since the
problem is
At 12:27 AM 8/7/2003, [EMAIL PROTECTED] wrote:
My desire is to create (make) a set of fonts for the Akan
language for Windows 2000 to begin with. I have been able to create a
crude version for my own use but I know that the people of Ghana would be
very happy to be able to install a
In message [EMAIL PROTECTED] Michael Everson writes:
Re: Colourful scripts and Aramaic
This is nearly off topic, but I'd be glad of any clarifications, or
references that anybody has.
In message [EMAIL PROTECTED] Michael Everson
wrote in response to Peter Kirk, with a clarification I agree with
Well, I've been promising that good things would come
to those who wait. ;-)
At last, the Unicode website has been updated with the
online chapters for Unicode 4.0. See:
http://www.unicode.org/versions/Unicode4.0.0/
Or just go to the Unicode 4.0 link from the home page.
Enjoy.
--Ken
P.S.
- Original Message -
From: Peter Kirk [EMAIL PROTECTED]
To: Jon Hanna [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, August 13, 2003 3:05 PM
Subject: Re: Questions on ZWNBS - for line initial holam plus alef
On 13/08/2003 04:44, Jon Hanna wrote:
No, the safe thing to do
On 11/08/2003 06:59, Jon Hanna wrote:
There are only two theoretical problems that I can see here, the first is
that a whitespace character other than space gets converted to space by
attribute value normalisation, and that this changes the meaning of the text
in some way. This could only occur
On 13/08/2003 04:44, Jon Hanna wrote:
No, the safe thing to do (and the thing that is done) is to treat the space
as a space ignoring the fact that the NMTOKEN contains a combining
character, this is even safer than your suggestion since it can't
mis-identify the combining properties of a
Philip Verdy posted:
Could ZWS+combining diacritic may be the best solution for
isolated diacritics in text?
From http://www.unicode.org/book/ch04.pdf:
* Such characters may be large enough to effect the placement of
their base character relative to preceding and succeeding base
characters.
- Original Message -
From: Doug Ewell [EMAIL PROTECTED]
To: Unicode Mailing List [EMAIL PROTECTED]
Cc: Peter Kirk [EMAIL PROTECTED]; Kenneth Whistler
[EMAIL PROTECTED]
Sent: Monday, August 11, 2003 5:39 PM
Subject: Re: Questions on ZWNBS - for line initial holam plus alef
Peter Kirk
On Monday, August 11, 2003 12:27 AM, Kenneth Whistler [EMAIL PROTECTED] wrote:
A point I keep trying to make, but which often gets overlooked
by people trying to code Unicode mechanisms for dealing with
edge cases, is that the design goal of the Unicode Standard is,
and always has been, to
Peter Kirk scripsit:
On 13/08/2003 11:09, Philippe Verdy wrote:
... For this reason, defective
combining sequences (combining characters without a leading base
character) should be forbidden (invalid for XML).
If there is even the remotest possibility of this happening, we need to
Jay,
Oracle's UTF-8 is not really a valid encoding. It
encodes surrogates as if they were characters. The kept the old Unicode
2.x code that only supports BMP to provide sort key compatibility for clients
who never upgraded to Unicode 3.0 support and are using 16 bit character
encoding
On 12/08/2003 20:28, John Cowan wrote:
Peter Kirk scripsit:
2) In attribute values, LF, CR, and TAB characters are normalized to
spaces. Not relevant here.
This would be relevant if it is legal for the character after LF, CR,
and TAB to be a combining mark. Is this legal? In this
Dear Unicode and Unicore List Subscribers,
The release of the Unicode Standard, Version 4.0 is right around the
corner. There is still time to place your individual or group orders and
to get the book sent to you directly from the publisher, fresh off the
press.
Anyone placing bulk orders is
1 - 100 of 202 matches
Mail list logo