Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Kevin Bracey
In message [EMAIL PROTECTED] "G. Adam Stanislav" [EMAIL PROTECTED] wrote: At 21:08 29-11-2000 -0800, Mark Davis wrote: 1. The Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP

Re: sequences and stuff

2000-11-30 Thread Michael Everson
Branislav, We're working on this; actually I am writing a paper which deals with some of the proposed solutions. That should be ready in a day or so. In the meantime, can you give me an example of a Czech or Slovak word in which ch is a grapheme, and another in which ch meet at a morpheme

Re: sequences and stuff

2000-11-30 Thread Brendan Murray/DUB/Lotus
Branislav Tichy [EMAIL PROTECTED] wrote: b) there are compound words, which have these sequences on a word border, and in this case, they stands for two separate graphemes and _are_ sorted as c+h, d+z a.s.f. the proper collation algorithmus would therefore have to realise (imho), whether

RE: [OT] Re: the Ethnologue

2000-11-30 Thread Elliotte Rusty Harold
At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote: Spoken language is not necessarily at all the same thing as written language . There are e.g. plenty of mutually incomprehensible forms of spoken English which might each deserve a code in a standard for spoken languages but

Japanese collation sequence?

2000-11-30 Thread 11digitboy
What is the Japanese collation sequence? Oh yeah, there are a bunch of Roman letters thrown in. And digits too. Yeah, anime CDs. Do I just katakanize the roman letters? And is "Sanzenin" "sa-n-se-n-i-n" or "3-0-0-0-i-n"? And how do I do long vowel mark? | ||\ __/__ | | _/_

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Mark Davis
We know of specific situations that caused problems, as outlined in the Corrigendum. a.. Process A performs security checks, but does not check for non-shortest forms. a.. Process B accepts the byte sequence from process A, and transforms it into UTF-16 while interpreting non-shortest forms. a..

Re: sequences and stuff

2000-11-30 Thread Keld Jørn Simonsen
On Thu, Nov 30, 2000 at 05:18:59AM -0800, Brendan Murray/DUB/Lotus wrote: Branislav Tichy [EMAIL PROTECTED] wrote: b) there are compound words, which have these sequences on a word border, and in this case, they stands for two separate graphemes and _are_ sorted as c+h, d+z a.s.f. the

Re: sequences and stuff

2000-11-30 Thread Brendan Murray/DUB/Lotus
Keld Jørn Simonsen [EMAIL PROTECTED] wrote: I have no examples off my head on Danish names where "aa" actually means two a-s, pronounced as two sounds. I know of at least one - what about "Haageman"? That's pronounced (using English) "Hay-e-man". Brendan

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Doug Ewell
"G. Adam Stanislav" [EMAIL PROTECTED] wrote: 1. The Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non- shortest forms for BMP characters, I find this silly. That creation of such forms would be forbidden I can see

Re: sequences and stuff

2000-11-30 Thread Keld Jørn Simonsen
On Thu, Nov 30, 2000 at 07:52:37AM -0800, Brendan Murray/DUB/Lotus wrote: Keld Jørn Simonsen [EMAIL PROTECTED] wrote: I have no examples off my head on Danish names where "aa" actually means two a-s, pronounced as two sounds. I know of at least one - what about "Haageman"? That's

RE: [OT] Re: the Ethnologue

2000-11-30 Thread Doug Ewell
Elliotte Rusty Harold [EMAIL PROTECTED] wrote: At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote: Spoken language is not necessarily at all the same thing as written language . There are e.g. plenty of mutually incomprehensible forms of spoken English which might each deserve

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
Elliotte Rusty Harold wrote: I've yet to encounter a spoken version of English that I couldn't understand, after at most a couple of minutes of accustoming myself to the accent. You live in a country where dialect differentiation is a feeble thing, consisting mainly in pronunciation, and

Re: sequences and stuff

2000-11-30 Thread Mark Davis
The soft hyphen is not sufficient, since in other languages the case where two letters must be distinguished in collation may not fall on a syllable boundary, or allow hyphenation between them. The UTC looked at all the possible existing boundary-control characters; none of them really work for

Re: sequences and stuff

2000-11-30 Thread Brendan Murray/DUB/Lotus
Keld Jørn Simonsen [EMAIL PROTECTED] wrote: Anyway, you may have been fooled by the "g" which may be numb, or pronounced like a short "u". so it is: Haa-ge-man Hå ue man Nope - the first syllable in this surname *is* pronounced as the English "hay" rather than "hoe". And I used this

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Markus Scherer
Kevin Bracey wrote: I find this silly. That creation of such forms would be forbidden I can see and agree to. But interpretation? I understand the reasoning when security is an issue. But why make it flat illegal? There are many applications where such a sequence poses no security danger.

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Michael \(michka\) Kaplan
And to be clear, what it means in this case: 1) People have security concerns about UTF-8 2) The Unicode Consortium has an official solution to address these concerens 3) Your implementation does not The "People" from (1) can believe what they will about your implementation! MichKa Michael

Re: Unicode Case Mappings UTR #21

2000-11-30 Thread James E. Agenbroad
On Thu, 30 Nov 2000, Antoine Leca wrote: Carl W. Brown wrote: #3 French also has other articles such as d'. Yes. But this one, contrary to "l'" can according to the context, either be the contraction (élidé) of "de", or can be a genuine part of a proper name... When it comes to

Re: sequences and stuff

2000-11-30 Thread Keld Jørn Simonsen
On Thu, Nov 30, 2000 at 09:22:54AM -0800, Brendan Murray/DUB/Lotus wrote: Keld Jørn Simonsen [EMAIL PROTECTED] wrote: Anyway, you may have been fooled by the "g" which may be numb, or pronounced like a short "u". so it is: Haa-ge-man Hå ue man Nope - the first syllable in this

Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler
John Cowan noted: In general, Geordie (the traditional dialect spoken around the Tyne River in England) is considered to be the English dialect most difficult for North Americans. To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the

Re: Fwd: Direct dispatch from London

2000-11-30 Thread 11digitboy
| ||\ __/__ | | _/_ | || / | _|_ ,--, / \ /_| -+- / --- | / |V T_)| | |\ | ||/ _ \_/ T / \ / __/ | /--- \_/ L/ \ Alain LaBonté  [EMAIL PROTECTED] wrote: Actual author unknown (anonymous)...

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
Kenneth Whistler wrote: To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the United States on occasion, my wife and I often turn to each other in bafflement and say, "Subtitles, please." Scots is a separate language! If you understand

Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler
John Cowan replied: Kenneth Whistler wrote: To that I would add Glaswegian. When watching the Scots-produced mystery shows that show up on PBS in the United States on occasion, my wife and I often turn to each other in bafflement and say, "Subtitles, please." Scots is a separate

Re: sequences and stuff

2000-11-30 Thread G. Adam Stanislav
On Thu, Nov 30, 2000 at 04:55:15AM -0800, Michael Everson wrote: We're working on this; actually I am writing a paper which deals with some of the proposed solutions. That should be ready in a day or so. In the meantime, can you give me an example of a Czech or Slovak word in which ch is a

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread G. Adam Stanislav
On Thu, Nov 30, 2000 at 07:12:37AM -0800, Mark Davis wrote: We know of specific situations that caused problems, as outlined in the Corrigendum. That does not justify forbidding it in other situations (ask the NRA :) ). Adam -- When a finger points at the Moon... do you look at the Moon? Or,

Re: sequences and stuff

2000-11-30 Thread Keld Jørn Simonsen
On Thu, Nov 30, 2000 at 03:44:00AM -0800, Branislav Tichy wrote: hello, this subject (or alike) has been probably already discussed, but let me ask one more question about it: sequences vrs collating i have recently read the page //www.unicode.org/unicode/standard/where/ and i basically

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread G. Adam Stanislav
On Thu, Nov 30, 2000 at 10:18:07AM -0800, Markus Scherer wrote: you are free to write and use a non-conformant implementation. just be aware of what that means... :-) markus I guess it means I'm a non-conformist. :) I am currently working on software that translates mark-up made in one mark-up

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Kenneth Whistler
Adam said: On Thu, Nov 30, 2000 at 10:18:07AM -0800, Markus Scherer wrote: you are free to write and use a non-conformant implementation. just be aware of what that means... :-) markus I guess it means I'm a non-conformist. :) I am currently working on software that translates mark-up

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread David Starner
On Thu, Nov 30, 2000 at 04:48:56PM -0800, G. Adam Stanislav wrote: If the source (in Ister) uses illegal but decipherable UTF-8, my software accepts it. Naturally, before it sends it out it transforms it to perfectly legal UTF-8. The idea I should reject it is silly (and, no, the "internal

Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan
On Thu, 30 Nov 2000, Kenneth Whistler wrote: Scots is a separate language! If you understand anything at all it's by a happy accident. (There is of course Scots-flavored English as well, which is another matter.) I was, of course, referring to Scots (alleged) English, and not to

display problems on browser

2000-11-30 Thread sreekant
hi, I am facing problems when I am trying to display non-english characters on my browser. I am getting "?" and I want to see characters in various other languages too. What should I do? Should I install any special software or should I configure my browser. Please advise as I have to