In message [EMAIL PROTECTED]
"G. Adam Stanislav" [EMAIL PROTECTED] wrote:
At 21:08 29-11-2000 -0800, Mark Davis wrote:
1. The Unicode Technical Committee has modified the definition of UTF-8 to
forbid conformant implementations from interpreting non-shortest forms for
BMP
Branislav,
We're working on this; actually I am writing a paper which deals with some
of the proposed solutions. That should be ready in a day or so. In the
meantime, can you give me an example of a Czech or Slovak word in which
ch is a grapheme, and another in which ch meet at a morpheme
Branislav Tichy [EMAIL PROTECTED] wrote:
b) there are compound words, which have these sequences on a word border,
and in this case, they stands for two separate graphemes and _are_ sorted
as c+h, d+z a.s.f.
the proper collation algorithmus would therefore have to realise (imho),
whether
At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:
Spoken language is not necessarily at all the same
thing as written language .
There are e.g. plenty of mutually incomprehensible
forms of spoken English which might each deserve a
code in a standard for spoken languages but
What is the Japanese collation sequence? Oh yeah, there
are a bunch of Roman letters thrown in. And digits
too.
Yeah, anime CDs.
Do I just katakanize the roman letters? And is "Sanzenin"
"sa-n-se-n-i-n" or "3-0-0-0-i-n"? And how do I do long
vowel mark?
| ||\ __/__ | | _/_
We know of specific situations that caused problems, as outlined in the
Corrigendum.
a.. Process A performs security checks, but does not check for non-shortest
forms.
a.. Process B accepts the byte sequence from process A, and transforms it
into UTF-16 while interpreting non-shortest forms.
a..
On Thu, Nov 30, 2000 at 05:18:59AM -0800, Brendan Murray/DUB/Lotus wrote:
Branislav Tichy [EMAIL PROTECTED] wrote:
b) there are compound words, which have these sequences on a word border,
and in this case, they stands for two separate graphemes and _are_ sorted
as c+h, d+z a.s.f.
the
Keld Jørn Simonsen [EMAIL PROTECTED] wrote:
I have no examples off my head on Danish names
where "aa" actually means two a-s, pronounced as two sounds.
I know of at least one - what about "Haageman"? That's pronounced (using
English) "Hay-e-man".
Brendan
"G. Adam Stanislav" [EMAIL PROTECTED] wrote:
1. The Unicode Technical Committee has modified the definition of
UTF-8 to forbid conformant implementations from interpreting non-
shortest forms for BMP characters,
I find this silly. That creation of such forms would be forbidden I
can see
On Thu, Nov 30, 2000 at 07:52:37AM -0800, Brendan Murray/DUB/Lotus wrote:
Keld Jørn Simonsen [EMAIL PROTECTED] wrote:
I have no examples off my head on Danish names
where "aa" actually means two a-s, pronounced as two sounds.
I know of at least one - what about "Haageman"? That's
Elliotte Rusty Harold [EMAIL PROTECTED] wrote:
At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:
Spoken language is not necessarily at all the same thing as
written language . There are e.g. plenty of mutually
incomprehensible forms of spoken English which might each deserve
Elliotte Rusty Harold wrote:
I've yet to encounter a spoken
version of English that I couldn't understand, after at most a couple
of minutes of accustoming myself to the accent.
You live in a country where dialect differentiation is a feeble thing,
consisting mainly in pronunciation, and
The soft hyphen is not sufficient, since in other languages the case where
two letters must be distinguished in collation may not fall on a syllable
boundary, or allow hyphenation between them.
The UTC looked at all the possible existing boundary-control characters;
none of them really work for
Keld Jørn Simonsen [EMAIL PROTECTED] wrote:
Anyway, you may have been fooled by the "g" which may be numb,
or pronounced like a short "u". so it is:
Haa-ge-man
Hå ue man
Nope - the first syllable in this surname *is* pronounced as the English
"hay" rather than "hoe". And I used this
Kevin Bracey wrote:
I find this silly. That creation of such forms would be forbidden I can see
and agree to. But interpretation? I understand the reasoning when security
is an issue. But why make it flat illegal? There are many applications
where such a sequence poses no security danger.
And to be clear, what it means in this case:
1) People have security concerns about UTF-8
2) The Unicode Consortium has an official solution to address these
concerens
3) Your implementation does not
The "People" from (1) can believe what they will about your implementation!
MichKa
Michael
On Thu, 30 Nov 2000, Antoine Leca wrote:
Carl W. Brown wrote:
#3 French also has other articles such as d'.
Yes. But this one, contrary to "l'" can according to the context,
either be the contraction (élidé) of "de", or can be a genuine
part of a proper name... When it comes to
On Thu, Nov 30, 2000 at 09:22:54AM -0800, Brendan Murray/DUB/Lotus wrote:
Keld Jørn Simonsen [EMAIL PROTECTED] wrote:
Anyway, you may have been fooled by the "g" which may be numb,
or pronounced like a short "u". so it is:
Haa-ge-man
Hå ue man
Nope - the first syllable in this
John Cowan noted:
In general, Geordie (the traditional dialect spoken around the Tyne
River in England) is considered to be the English dialect most difficult
for North Americans.
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the
| ||\ __/__ | | _/_ | || /
| _|_ ,--, / \ /_| -+- / --- | /
|V T_)| | |\ | ||/ _
\_/ T / \ / __/ | /--- \_/ L/ \
Alain LaBonté [EMAIL PROTECTED] wrote:
Actual author unknown (anonymous)...
Kenneth Whistler wrote:
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the United
States on occasion, my wife and I often turn to each other
in bafflement and say, "Subtitles, please."
Scots is a separate language! If you understand
John Cowan replied:
Kenneth Whistler wrote:
To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the United
States on occasion, my wife and I often turn to each other
in bafflement and say, "Subtitles, please."
Scots is a separate
On Thu, Nov 30, 2000 at 04:55:15AM -0800, Michael Everson wrote:
We're working on this; actually I am writing a paper which deals with some
of the proposed solutions. That should be ready in a day or so. In the
meantime, can you give me an example of a Czech or Slovak word in which
ch is a
On Thu, Nov 30, 2000 at 07:12:37AM -0800, Mark Davis wrote:
We know of specific situations that caused problems, as outlined in the
Corrigendum.
That does not justify forbidding it in other situations (ask the NRA :) ).
Adam
--
When a finger points at the Moon... do you look at the Moon?
Or,
On Thu, Nov 30, 2000 at 03:44:00AM -0800, Branislav Tichy wrote:
hello,
this subject (or alike) has been probably already discussed, but let me
ask one more question about it: sequences vrs collating
i have recently read the page //www.unicode.org/unicode/standard/where/
and i basically
On Thu, Nov 30, 2000 at 10:18:07AM -0800, Markus Scherer wrote:
you are free to write and use a non-conformant implementation. just be aware of what
that means... :-)
markus
I guess it means I'm a non-conformist. :)
I am currently working on software that translates mark-up made in one
mark-up
Adam said:
On Thu, Nov 30, 2000 at 10:18:07AM -0800, Markus Scherer wrote:
you are free to write and use a non-conformant implementation. just be aware of
what that means... :-)
markus
I guess it means I'm a non-conformist. :)
I am currently working on software that translates mark-up
On Thu, Nov 30, 2000 at 04:48:56PM -0800, G. Adam Stanislav wrote:
If the source (in Ister) uses illegal but decipherable UTF-8, my
software accepts it. Naturally, before it sends it out it transforms
it to perfectly legal UTF-8. The idea I should reject it is silly
(and, no, the "internal
On Thu, 30 Nov 2000, Kenneth Whistler wrote:
Scots is a separate language! If you understand anything at all
it's by a happy accident. (There is of course Scots-flavored
English as well, which is another matter.)
I was, of course, referring to Scots (alleged) English, and not
to
hi,
I am facing problems when I am trying to display non-english characters
on my browser. I am getting "?" and I want to see characters in
various other languages too. What should I do?
Should I install any special software or should I configure my browser.
Please advise as I have to
30 matches
Mail list logo