subject:"Transcriptions of Unicode"

Re: Transcriptions of Unicode

2001-08-20 Thread Mark Davis


I happened upon a passage bolstering Mario's point that the English
pronunciation of long U (as yoo, /ju/) does derive from it's being the
closest pronunciation that the English could make to the French
pronunciation of U (as /y/).

That passage is in Honni soit qui mal y pense : L'incroyable histoire de
l'amour entre le français et l'anglais. It is on page 158, in Pourquoi
dit-on « miouzik » en anglais ? (I recommend the book: the writing is clear
and accessible even for someone (like me) of limited French.)

I put a link to the book on my booklist
(http://www.macchiato.com/books/nonfiction.html).

Mark

—

Ὀλίγοι ἔμφονες πολλῶν ἀφρόνων φοβερώτεροι — 
Πλάτωνος
[http://www.macchiato.com]

- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Monday, January 15, 2001 01:15
Subject: RE: Transcriptions of Unicode


 Mark Davis wrote:
 Much as I admire and appreciate the French language (second only to
 Italian),
 the proximate derivation of Unicode was not from that language, and the
 transcription should not match the French pronunciation. Instead, it has
 solid Northern Californian roots (even though not exactly dating from the
 Gold Rush days).

 Of course, my comment about French pronunciation was only partially
serious
 -- I should have added as smiley. But I think that /ynikod/ is the actual
 pronunciation of Unicode in French (as opposed to most other European
 language, that simply approximate the English pronunciation). So, as you
 explained that you are listing languages, and that you accept more than
one
 language for each script, you might consider a second IPA example.

 According to the references I have, the prefix uni is directly from
Latin
 while the word code is through French.

 I wonder what directly from Latin may mean in the case of English.
Because
 of some timing problems, I would say it means: through direct knowledge
of
 *written* Latin.

 A direct derivation from Latin of English uni- would imply that, at some
 age, English scholars used to read Latin with a pronunciation influenced
by
 French. In fact, the initial [ju:] is the regular English approximation of
 French vowel [y]. (Is this likely?)

 The Indo-European would have been *oi-no-kau-do (give one strike): *kau
 apparently being related to [...] caudal, [...]

 Wow! So Unicode also means single tail, after all... What would that be
in
 Chinese? :-)

 Marco

Re: Transcriptions of Unicode

2001-01-29 Thread Marcin 'Qrczak' Kowalczyk


Mon, 15 Jan 2001 13:09:47 -0800 (GMT-0800), G. Adam Stanislav [EMAIL PROTECTED] 
pisze:

 I would not be surprised if speakers of certain Slavic languages even
 changed the SPELLING to Unikod (with an acute over the [o]), as they
 have done with other imported words (such as futbal for football).

That is what we in Polish newsgroups often do, even if it's very
unofficial; I don't expect Unicode or Unikod in dictionaries soon.
Without acute over the [o], which would mean a different thing.
Actually "kod" in Polish means "code".

-- 
 __("  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTPCZA
QRCZAK

Re: Transcriptions of Unicode

2001-01-29 Thread Marcin 'Qrczak' Kowalczyk


Fri, 12 Jan 2001 07:28:18 -0800 (GMT-0800), Mark Davis [EMAIL PROTECTED] pisze:

 According to the references I have, the prefix "uni" is directly from
 Latin while the word "code" is through French. The Indo-European would
 have been *oi-no-kau-do ("give one strike"): *kau apparently being
 related to such English words as: hew, haggle, hoe, hag, hay, hack,
 caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus,
 and Kova (personal name: 'smith').

Oh, so my surname is related to Unicode? :-)
"Kowal" means "smith" in Polish.

-- 
 __("  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTPCZA
QRCZAK

Re: Transcriptions of Unicode

2001-01-16 Thread John Jenkins



On Monday, January 15, 2001, at 05:08 PM, G. Adam Stanislav wrote:

 That's exactly what I said. Unicode as an international standard will
 be pronounced internationally: Speakers of each language will have
 their own pronunciation, and some will even spell it differently.


Ah, got it.  I'm sorry, I misunderstood you as meaning that there would 
be one, international pronunciation.  My apologies.

RE: Transcriptions of Unicode

2001-01-15 Thread Marco Cimarosti


Mark Davis wrote:
Much as I admire and appreciate the French language (second only to
Italian),
the proximate derivation of "Unicode" was not from that language, and the
transcription should not match the French pronunciation. Instead, it has
solid Northern Californian roots (even though not exactly dating from the
Gold Rush days).

Of course, my comment about French pronunciation was only partially serious
-- I should have added as smiley. But I think that /ynikod/ is the actual
pronunciation of "Unicode" in French (as opposed to most other European
language, that simply approximate the English pronunciation). So, as you
explained that you are listing languages, and that you accept more than one
language for each script, you might consider a second IPA example.
 
According to the references I have, the prefix "uni" is directly from Latin
while the word "code" is through French.

I wonder what "directly from Latin" may mean in the case of English. Because
of some timing problems, I would say it means: "through direct knowledge of
*written* Latin".

A direct derivation from Latin of English "uni-" would imply that, at some
age, English scholars used to read Latin with a pronunciation influenced by
French. In fact, the initial [ju:] is the regular English approximation of
French vowel [y]. (Is this likely?)

The Indo-European would have been *oi-no-kau-do ("give one strike"): *kau
apparently being related to [...] caudal, [...]
 
Wow! So Unicode also means "single tail", after all... What would that be in
Chinese? :-)

Marco

Re: Transcriptions of Unicode

2001-01-15 Thread Charles


Michael Everson wrote:

"The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates
me a lot. No one would pronounce "universe" with an [i]."

I beg to differ; "universe" is commonly pronounced with a short [i] in the
English Midlands.

Charles Cox

Re: Transcriptions of Unicode

2001-01-15 Thread Alain LaBonté


À 06:16 2001-01-15 -0800, Charles a écrit:
Michael Everson wrote:

The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa
irritates
me a lot. No one would pronounce universe with an
[i].
[Charles]
I beg to differ; universe is
commonly pronounced with a short [i] in the
English Midlands.

[Alain] A schwa for an i and an English u to pronounce
Unicode begins to be extremely different from the
pronunciation of Unicode in French (as I can't write with the
IPA on this list, I will add German Ünicod to Marco's
ynicod to make sure that most of you know how we pronounce
it). This word, in its written form, shocks nobody in French (« et ce
n'est pas peu dire ! »), even the most bigot and pious purists of the
French language... 

 But if you insist that the French speakers pronounce those
two letters, it is the contrary, we will have to write the
mandated IPA prononciation as « Iouneucôde » in French (there
is no real scwha in French, imho)... Otherwise you create a
strong issue in French.

 Please do not play with pronunciation... Unicode is not a
standard about pronunciation, but rather -- and it is where it is an
instrument of civilization -- a standard about writing... Writing tends
to unite people, spoken languages tend to disunite them... An English
speaker with a prefect knowledge of written French who does not pronounce
French correctly is absolutely not understood, and the reverse is
probably true too. I am a watcher of some American TV programs (mainly
sci-fi) on TV, but I have to put subtitles to fully catch what I don't
understand (unfortunately there is no subtitle in a meeting where English
is spoken, and it is *always* a handicap to me).

 Please, no official IPA transcription for
Unicode...

Alain LaBonté
Québec

Re: Transcriptions of Unicode

2001-01-15 Thread John Jenkins



On Monday, January 15, 2001, at 06:34 AM, Michael Everson wrote:

 The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa 
 irritates
 me a lot. No one would pronounce "universe" with an [i].


Then forgive me, Michael, for I have sinned.  I just sent in to Mark a 
Deseret Alphabet transcription that uses [i].  In my defense, I tend to 
find short, unstressesd vowels hard to tell apart in many English 
words.  I really don't know what vowel I use in "universe."  And the DA 
doesn't have a true schwa symbol, anyway.

RE: Transcriptions of Unicode

2001-01-15 Thread Marco Cimarosti


{Notice: way off-topic}

Mark Davis wrote:
 There was a period well after the Norman invasion where a 
 large number of words came into English directly from
 Latin, which was still in widespread use among scholars.

Right. And it also was the language of priests, on both sides of the
Channel.

 [ju:] isn't an approximation to the French [y]. There was a 
 phase in the development of English called the Great Vowel
 Shift, where certain long vowels shifted back: a = [e:],
 e = [i:], i = [ai], o = [u:] (as in fool, move), u = [ju:].
 I don't remember when this was -- it's been a long time -- but
 I seem to recall that it was a bit before Shakespeare. The
 pronunciation of u in French shifted at some point from [u] 
 to [y]; I have no idea when this change happened, or if it
 would have affected the Latin spoken by the English at the time.
 Perhaps someone else knows.

No, sorry. Middle English [u:] normally became modern [au] -- e.g.: "hus"
[hu:s] - "house" [hauz].

I insist that [ju:] was the English rendering of the alien French phoneme
[y]. The fact that it did not become [jau] simply testifies that most French
words (re-)entered English *after* the GVS was concluded.

Marco

RE: Transcriptions of Unicode

2001-01-15 Thread Christopher John Fynn


Mark Davis [mailto:[EMAIL PROTECTED]] wrote:

 "Marco Cimarosti" [EMAIL PROTECTED] wrote:
  I wonder what "directly from Latin" may mean in the case of English.
 Because
  of some timing problems, I would say it means: "through direct knowledge
 of
  *written* Latin".
 
 There was a period well after the Norman invasion where a large number of
 words came into English directly from Latin, which was still in widespread
 use among scholars.

Yes, and it was right into the early 20th Century.  Even when I was in school a large 
percentage of English schoolboys _had_ to learn Latin (- and in many "public" 
[private] schools they still do). This included "spoken" Latin - though I'm sure the 
pronunciation taught was quite different than what it was in 55 BCE. Not all that long 
ago you couldn't get into many English universities without having studied some Latin. 
 

In English we still get plenty of scientific names and terms from Latin and Greek and 
many of these words eventually come into more common usage. 

 - Chris

Re: Transcriptions of Unicode

2001-01-15 Thread Curtis Clark


At 06:16 AM 1/15/01, Charles wrote:
Michael Everson wrote:

"The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates
me a lot. No one would pronounce "universe" with an [i]."

I beg to differ; "universe" is commonly pronounced with a short [i] in the
English Midlands.

And indeed on this side of the Pond, [i] is common (I find it unnatural to 
drop my tongue enough for the schwa), and I have heard (iirc) [i:] in the 
southeastern U.S.


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Biological Sciences Department Voice: (909) 869-4062
California State Polytechnic University  FAX: (909) 869-4078
Pomona CA 91768-4032  USA  [EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-15 Thread P. T. Rourke


 Just to expand upon this with data:

 1. When I learned Latin in the U.S. in the 1960s, we were taught a
 reconstructed Roman pronunciation.

Before someone asks him how anyone could know how say a 1st c. ce Roman
pronounced things, reconstruction can be informed by such things as
transliteration of names into Greek by Greek authors, common misspellings,
metrical values, etc.   It can't be precisely accurate, but it's probably
not that far off.

BTW, Montaigne's first language was Latin. French was his second language.
His father wanted him to know his Latin like a Roman.  This is rather like
A.K. Ramanujan's (Indian poet's) description of his upbringing: in one
floor/wing of the house, only English was allowed; on another floor, only
Hindi(?), in a third, only Tamil. . . .

Patrick Rourke
[EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-15 Thread G. Adam Stanislav


At 06:16 15-01-2001 -0800, Charles wrote:
Michael Everson wrote:

"The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates
me a lot. No one would pronounce "universe" with an [i]."

I beg to differ; "universe" is commonly pronounced with a short [i] in the
English Midlands.

Besides, the name of an international standard will be pronounced
internationally.

For example, my native tongue, Slovak, does not even have the English
schwa sound. It would be ridiculous to expect Slovak Unicode users
to learn a new phoneme just so they pronounce Unicode properly. They
will pronounce it ['uniko:d], like it or not. :) And they will not turn
the [o:] into an [ou] (as the English speakers do) either. Plus the [i]
will be somewhere halfway between the English [i] and [i:].

I would not be surprised if speakers of certain Slavic languages even
changed the SPELLING to Unikod (with an acute over the [o]), as they
have done with other imported words (such as futbal for football). This
is simply because they follow the write-as-you-hear rule. Insisting
that they keep the original spelling would be linguistic imperialism,
hardly appropriate for the makers of Unicode.

Adam
--- 
Whiz Kid Technomagic - brand name computers for less.
See http://www.whizkidtech.net/pcwarehouse/ for details.

Re: Transcriptions of Unicode

2001-01-15 Thread Alain LaBonté


 13:27 2001-01-15 -0500, [EMAIL PROTECTED] a crit:

My argument for the world converging on dutch as the
only language that is written as it is spoke.  Vic

You really believe that  Schiphol  is written as pronounced ?   (; (:

Alain
 
__
ifrance.com, l'email gratuit le plus complet de l'Internet !
vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP...
http://www.ifrance.com/_reloc/email.emailif

RE: Transcriptions of Unicode

2001-01-15 Thread jarkko . hietaniemi


   1. When I learned Latin in the U.S. in the 1960s, we were taught a
   reconstructed Roman pronunciation.
 
 Latin is still spoken in Rome, at the Vatican.
 
 So there is a Roman pronunciation even today... (;
 
 Just kidding... although what I say is true...
 
 Alain

How about a weekly radio news broadcast in Latin?

http://www.yle.fi/ylenykko/nuntii.html (in Finnish :-)
http://www.yle.fi/fbc/latini/ (in Latin)
http://www.yle.fi/fbc/latini/summary.html (in English)

Re: Transcriptions of Unicode

2001-01-15 Thread John Jenkins



On Monday, January 15, 2001, at 01:09 PM, G. Adam Stanislav wrote:


 Besides, the name of an international standard will be pronounced
 internationally.


Why?  I don't pronounce "Paris" the way the French do.  Why should I 
expect people from other countries to pronounce "Unicode" the way I do?

Heck, I don't even expect other *English* speakers to pronounce it the 
way I do.  I'm convinced I have a short i in the middle of it.

Re: Transcriptions of Unicode

2001-01-15 Thread Peter_Constable



On 01/15/2001 04:25:00 AM Michael Everson wrote:

The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates
me a lot. No one would pronounce "universe" with an [i].

Well, note that it was transcribed not with [i:] but with the open
counterpart (IPA symbol 319 rather than 301). That's certainly plausible,
perhaps in certain dialects or in careful speech. I have heard some
actually say [i] (not [i:]) but as I remember they were not native English
speakers (i.e. they had the phonology of another language influencing their
pronunciation when speaking English). I agree with Michael that schwa is a
probably lot more likely for most speakers, though.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-15 Thread Patrick T. Rourke


He didn't actually say it: someone joked at a dinner or fundraiser that Dan
Quayle had felt guilty that he hadn't studied his Latin upon his visit to
Latin America, and the press picked it up as though it were a true report of
Quayle's own words.

What it says of the man that millions of people believed him capable of
saying it, I leave others to decide.  I suspect that he was nominated
because he reminded GHWB of someone.

Patrick Rourke
[EMAIL PROTECTED]

- Original Message -
From: "Tex Texin" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Monday, January 15, 2001 5:27 PM
Subject: Re: Transcriptions of "Unicode"


 Wasn't it Dan Quayle who said they speak Latin in Latin America?


 [EMAIL PROTECTED] wrote:
 
 1. When I learned Latin in the U.S. in the 1960s, we were taught a
 reconstructed Roman pronunciation.
  
   Latin is still spoken in Rome, at the Vatican.
  
   So there is a Roman pronunciation even today... (;
  
   Just kidding... although what I say is true...
  
   Alain
 
  How about a weekly radio news broadcast in Latin?
 
  http://www.yle.fi/ylenykko/nuntii.html (in Finnish :-)
  http://www.yle.fi/fbc/latini/ (in Latin)
  http://www.yle.fi/fbc/latini/summary.html (in English)

 --
 According to Murphy, nothing goes according to Hoyle.
 --
 Tex Texin  Director, International Business
 mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
 Progress Software Corp.14 Oak Park, Bedford, MA 01730

 http://www.Progress.com#1 Embedded Database

 Globalization Program
 http://www.Progress.com/partners/globalization.htm
 --
-

Re: Transcriptions of Unicode

2001-01-15 Thread G. Adam Stanislav


At 14:11 15-01-2001 -0800, John Jenkins wrote:

On Monday, January 15, 2001, at 01:09 PM, G. Adam Stanislav wrote:


 Besides, the name of an international standard will be pronounced
 internationally.


Why?  I don't pronounce "Paris" the way the French do.  Why should I 
expect people from other countries to pronounce "Unicode" the way I do?

That's exactly what I said. Unicode as an international standard will
be pronounced internationally: Speakers of each language will have
their own pronunciation, and some will even spell it differently.

Adam
--- 
Whiz Kid Technomagic - brand name computers for less.
See http://www.whizkidtech.net/pcwarehouse/ for details.

Re: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti


Hallo everybody!

I don't fully agree with Mark Davis' API transcription of "Unicode":

http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
IPA.gif

Because:

1) I think that IPA transcriptions should be in [square brackets], while
phonemic transcriptions should be in /slashes/. If neither enclosing is
present, the transcription is ambiguous.

2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist
in any standard pronunciation of contemporary English. It should rather be
the diphthong [ou] (where the [u] would probably better be U+028A).

3) The transcription shows the primary stress on the first syllable, and a
secondary stress on the last one. In the few occasions when I heard native
English speakers saying "Unicode", I had the impression that it rather was
the other way round.

4) As "Unicode" is the proper name of an international standard, and it is
built with two English roots of French origin, it could as well be
considered a French word, which would lead to a totally different
transcription.

Sorry if I am repeating something already said by other people: I have been
off the list for a while. And, about points 2 and 3 above, beware that I am
a second language English speaker and that I don't have much experience of
American pronunciation.

Ciao.
Marco Cimarosti

Re: Transcriptions of Unicode

2001-01-12 Thread Lukas Pietsch


Marco Cimarosti wrote:

 I don't fully agree with Mark Davis' API transcription of "Unicode":


http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U
_IPA.gif

Neither do I, but partly for different reasons.


 1) I think that IPA transcriptions should be in [square brackets], while
 phonemic transcriptions should be in /slashes/. If neither enclosing is
 present, the transcription is ambiguous.

Right. And that's actually part of the key to the problem's answer:

 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not
exist
 in any standard pronunciation of contemporary English. It should rather
be
 the diphthong [ou] (where the [u] would probably better be U+028A).

In America, transcribing the vowel in "code" as /o/ (and "made" as /e/) is
not uncommon, at least in *phonemic* transcription. Generally, American
accents have less diphthongization in these sounds than British accents
have, and phonemically it makes sense to see these sounds as part of the
series of "long vowels". A *narrow phonetic* transcription would have
something like [u+006F u+028A] for American, and [u+0259 u+028A] for
British.

 3) The transcription shows the primary stress on the first syllable, and
a
 secondary stress on the last one. In the few occasions when I heard
native
 English speakers saying "Unicode", I had the impression that it rather
was
 the other way round.

I can't tell, because where I live I don't get to talk to native speakers
about Unicode a lot. But: According to standard word-formation and
pronunciation patterns in English, the stress pattern shown ('uni,code) is
absolutely what you'd expect: as in "uniform", "unisex", "unicorn",
"universe". (D. Jones, English Pronouncing Dictionary, doesn't even mark a
secondary stress on the third syllable at all.)

 4) As "Unicode" is the proper name of an international standard, and it
is
 built with two English roots of French origin, it could as well be
 considered a French word, which would lead to a totally different
 transcription.

Right, but this particular pattern of merging word roots into a new word
does suggest English provenance, I think. And, historically, that's where
it did come from.

But there's another inconsistency in the transcription: the vowels in the
first ("u-") and third ("-code") syllable are both phonemically long.
Either you put the length mark on both (recommended for *phonetic*
transcription), or on neither (okay with *phonemic* transcription). (Of
course, if you transcribe the third syllable as a diphthong then you won't
get a length mark there.)

According to the conventions in D. Jones, English Pronouncing Dictionary,
you'd get something like:

[u+02C8 u+006A u+0075 u+02D0 u+006E u+026A  u+006B u+0259 u+028A u+0064]

Lukas


-
Lukas Pietsch
University of Freiburg
English Department

Phone (p.) (#49) (761) 696 37 23
mailto:[EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis




Much as I admire and appreciate the 
French language (second only to Italian), the proximate derivation of "Unicode" 
was not from that language, and the transcription should not match the French 
pronunciation. Instead, it has solid Northern Californian roots (even 
thoughnot exactly dating from the Gold Rush days).

According to the references I have, 
the prefix "uni" is directly from Latin while the word "code" is through French. 
The Indo-European would have been *oi-no-kau-do ("give one strike"):*kau 
apparently being related to such English words as: hew, haggle, hoe, hag, hay, 
hack, caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus, and 
Kovač (personal name: 'smith'). I will leave the exact derivations to the 
exegetes, but I like the association with "haggle" myself.

I will ask our resident phonetician 
about the IPA transcription. Clearly Standard British English would add some 
interesting -- and no doubt valuable --complexities and nuances to the 
vowels, but that is not the goal in this case. Even "o" is oftena 
diphthong in English, it is probably better to have [o:] as a target for 
matching from other languages, since [ou] may be considered slightly affected in 
the native language.

The stress is definitely on the 
first syllable. One does hear some normal generative English variations such as 
ˈjunəˌkoːd. (schwa instead of short-i), but the stress 
still should be on the first syllable, as in "unify", not later in the word as 
in "unique". Of course, the best approximation in the target language should be 
used: if it does not allow for that position for the stress (without affection), 
then the secondary stress should be used.

Mark

- Original Message - 


From: "Marco Cimarosti" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, January 12, 2001 
03:11
Subject: Re: Transcriptions of 
"Unicode"
 Hallo everybody!  I don't 
fully agree with Mark Davis' API transcription of "Unicode":  
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif  
Because:  1) I think that IPA transcriptions should be in 
[square brackets], while phonemic transcriptions should be in /slashes/. 
If neither enclosing is present, the transcription is ambiguous. 
 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not 
exist in any standard pronunciation of contemporary English. It should 
rather be the diphthong [ou] (where the [u] would probably better be 
U+028A).  3) The transcription shows the primary stress on the 
first syllable, and a secondary stress on the last one. In the few 
occasions when I heard native English speakers saying "Unicode", I had 
the impression that it rather was the other way round.  
4) As "Unicode" is the proper name of an international standard, and it 
is built with two English roots of French origin, it could as well 
be considered a French word, which would lead to a totally 
different transcription.  Sorry if I am repeating 
something already said by other people: I have been off the list for a 
while. And, about points 2 and 3 above, beware that I am a second 
language English speaker and that I don't have much experience of 
American pronunciation.  Ciao. Marco 
Cimarosti

Re: Transcriptions of Unicode: Still Missing scripts

2001-01-12 Thread Thomas Chan


On Thu, 11 Jan 2001, Mark Davis wrote:

 By the way, I am still missing the following. If anyone can supply them, I'd
 appreciate it.
 
 [BOPOMOFO]
[snip]
[MONGOLIAN]
[snip]
 See http://www.macchiato.com/unicode/Unicode_transcriptions.html for
 details.

It's still not very clear to me what this is supposed to be a list of.
The title says "Transcriptions of Unicode", and a note at the bottom says
"For non-Latin scripts the goal is to match the English pronunciation --
not spelling."

Some of the entries (leftmost column of the table) are names of languages,
while others are names of scripts.  e.g., "Russian" and "Japanese" are
names of languages, with examples given in Cyrillic and Katakana,
respectively.  For some scripts, there is basically only one language that
uses it, such as Katakana (used by Japanese) or Hangul (used by Korean),
while other scripts are used by many languages.  It this supposed to
suggest that Russian is the representative language to give a Cyrillic
example in, and say, not Mongolian?

In some cases, it seems the example is not necessarily a transcription of
the English pronunciation, but a translation into another language,
most likely a loanword, with attendant sound changes.  e.g., Japanese
"yunikoodo".  I notice the lack of a request for an example using the
Hiragana script (which is also used by Japanese), which suggests that the
Japanese example is not a transcription of the English pronunciation into
Katakana, but a Japanese word (albeit a loanword).  Otherwise, it would be
possible to provide a Hiragana example, however nonsenical or non-existant
it may be in reality.  There is also the particular case of the Chinese
entries, written in CJK "ideographs", which *are* translations using the
calque strategy.

It seems to me that this list is intended to showcase a variety of ways to
write "Unicode", be they transcriptions, transliterations, or
translations--whatever maximizes the number of scripts that one can show
off, apparently.

This raises some questions of what an example showcasing the Bopomofo
script should look like.  Basically, it is used only for Chinese,
primarily Mandarin (zh-guoyu).  It is also primarily an auxiliary script
for ruby annotation of Chinese text written in CJK "ideographs", although
it may stand alone.  So, if it is a transcription of English
pronunciation, then it will have to go through the language filter of
Mandarin Chinese, and this form may or may not be attested in 
reality--perhaps as a "best-fit" colloquial attempt to say a foreign
(English) word.  And this version would have the script standing alone.

Alternatively, it could be a transcription according to Mandarin Chinese
pronunication of the already existing Chinese translations written in CJK
"ideographs".  In this case, it could either stand alone, or be attached
as ruby annotation to the CJK "ideograph" version (in Chinese).
Implemenation-wise, it would be problematic seeing the Bopomofo at the
size it would be in for ruby annotation of text in a 96x24 bitmap (as
requested on the page.  Also, Bopomofo does have an inclination to be used
with Chinese text written top-to-bottom, so the horizontal shape of the
96x24 bitmap is problematic--more generally, vertically written scripts
such as the traditional Mongolian script (also requested) cannot be
demonstrated within this framework.


Thomas Chan
[EMAIL PROTECTED]

RE: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti


Peter Constable wrote:
 I'd add the square brackets, an off-glide on the "o", and
 aspiration (02b0) after the "k".

Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the
*beginning* of "words" (mainly because teachers told me I was supposed to
notice it), but I don't feel it *inside* a word.

 One other point:

Yes? :-)

Marco

Re: Transcriptions of Unicode

2001-01-12 Thread Thomas Chan


On Fri, 12 Jan 2001, Lukas Pietsch wrote:

 Marco Cimarosti wrote:
  3) The transcription shows the primary stress on the first syllable, and
 a
  secondary stress on the last one. In the few occasions when I heard
 native
  English speakers saying "Unicode", I had the impression that it rather
 was
  the other way round.
 
 I can't tell, because where I live I don't get to talk to native speakers
 about Unicode a lot. But: According to standard word-formation and

There is "Unicode, Oh Unicode" anthem/hymn--sound files located in
/Other/Sounds/ directory on the cd-rom published with the book, as well as
an audio track on the same disc.  If this can be taken as an official
stance on pronunciation of the term (the WhatIsThis.txt explanatory file
does not provide any clues), well, I do not know...


Thomas Chan
[EMAIL PROTECTED]

RE: Transcriptions of Unicode

2001-01-12 Thread Peter_Constable



On 01/12/2001 10:33:48 AM Marco Cimarosti wrote:

Is that k aspirated?

It is for any English speakers I've ever met.


 One other point:

Yes? :-)

Oops. It was to be the point about the aspirated k. I forgot to delete
that.


Peter

Re: Representation of aspiration (was: Re: Transcriptions of Unicode)

2001-01-12 Thread Richard Cook


Kenneth Whistler wrote:
 
 Richard Cook surmised:
 
  BTW, in a very close transcription, if one is using superscription
  (position above baseline) and relative size reduction to indicate
  aspiration, I suppose that degree of superscription or the size or both
  could be modulated to indicate degree of aspiration?
 
 Nah, if you tried to go down that path, you'd just end up with
 unrepresentable transcriptions and unreliable reproduction. I doubt
 that there are many transcribers who could reliably record more than
 three degrees of aspiration, anyway (roughly: slight aspiration,
 "normal" aspiration, and superaspiration).

Ken, I was only kidding ... mostly,  should have put a smiley in there
:-) But I was also thinking of the superscription question, which I
think Peter C. might like to discuss.
 
 Once you go past that level, which could be reliably indicated with
 appropriate use of diacritics, you are really into the realm of
 instrumental phonetics. I'd just hook up the machine and let it
 give you precise timings of voice delays post consonatal release
 in milliseconds.
 
 
  Or perhaps just mark-up the unsuperscripted aspiration indicator, to
  note degree of aspiration ... however you would like to measure that.
 
 No need to "mark it up". Just add another diacritic. That's how
 most transcribers would work, in practice.
 
Well, I was thinking of linking the transcription to the machine data
... so that the relation would be set on a compound key (aspiration
diacritic  measurement reference) ...

Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis


Thanks for your detailed note; I'll have to think it over.
...
 But there's another inconsistency in the transcription: the vowels in the
 first ("u-") and third ("-code") syllable are both phonemically long.
 Either you put the length mark on both (recommended for *phonetic*
 transcription), or on neither (okay with *phonemic* transcription). (Of

The o is significantly longer than the u, probably due to the following d.

...
 
 -
 Lukas Pietsch
 University of Freiburg
 English Department
 
 Phone (p.) (#49) (761) 696 37 23
 mailto:[EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-11 Thread Richard Cook


I see 2 Traditional Chinese translations here:

 http://www.macchiato.com/unicode/Unicode_transcriptions.html

Which one do people like?

http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese3.gif

Re: Transcriptions of Unicode

2001-01-11 Thread John Jenkins



On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote:

 Which one do people like?

 http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/
 U_Chinese2.gif

Is much better.  "Unified Code"

 http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/
 U_Chinese3.gif


Stinks.  "Standard International Code"

RE: Transcriptions of Unicode

2001-01-11 Thread Pan, Jenny


http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
Chinese3.gif and
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
Chinese2.gif  both are used in Taiwan.  If you type "Unicode" to the search
field at Taiwan Yahoo page  http://tw.yahoo.com, you will find
 
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
Chinese3.gif . 
See the traditional Chinese web page at
http://www.unicode.org/unicode/standard/translations/t-chinese.html

The translation of
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
Chinese2.gif is used in China/Hong Kong, see the simplified Chinese web page
at http://www.unicode.org/unicode/standard/translations/s-chinese.html

-Jenny Pan

-Original Message-
From: John Jenkins [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 11, 2001 3:42 PM
To: Unicode List
Subject: Re: Transcriptions of "Unicode"



On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote:

 Which one do people like?

 http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/
 U_Chinese2.gif

Is much better.  "Unified Code"

 http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/
 U_Chinese3.gif


Stinks.  "Standard International Code"

Re: Transcriptions of Unicode

2001-01-11 Thread Thomas Chan


On Thu, 11 Jan 2001, Richard Cook wrote:

 I see 2 Traditional Chinese translations here:
  http://www.macchiato.com/unicode/Unicode_transcriptions.html
 Which one do people like?

 
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif
 
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese3.gif

It seems the former ("tongyi ma") rather than the latter ("biaozhun wanguo
ma").

Some searches...

"tongyi ma" (U_Chinese2.gif):

Altavista: 66 matches
Yahoo (Chinese/Hong Kong/Taiwan): 78 matches
Microsoft Taiwan: 100 matches

("Yahoo Chinese" != "Yahoo China".  I couldn't get through to
Microsoft Hong Kong's search page.)

Also IUC10 page (http://www.unicode.org/iuc/iuc10/languages.html)
and Java glossary (http://java.sun.com/docs/glossaries/glossary.print.html)
agree.

"biaozhun wanguo ma" (U_Chinese3.gif):

Altavista: 7 matches
Yahoo (Chinese/Hong Kong/Taiwan): 1 match
Microsoft Taiwan: 78 matches


I do wonder, however, if "biaozhun wanguo ..." was meant as a translation
of "ISO ...".


Thomas Chan
[EMAIL PROTECTED]

Re: Transcriptions of Unicode

2001-01-11 Thread Richard Cook


John Jenkins wrote:
 
 On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote:
 
 Which one do people like?

 
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif
 
 Is much better.  "Unified Code"
 
This was my opinion too. I like "tongyima". And so far I haven't heard
from anyone advocating

http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images 
U_Chinese3.gif

 Stinks.  "Standard International Code"

Although "stinks" might be a little harsh. I'd opt for "opaque" :-)
Anyone else?

Re: Transcriptions of Unicode

2001-01-11 Thread Richard Cook


Jon Babcock wrote:
 
 At first glance, I agreed. But then if the U_Chinese3.gif, gets
 shortened to the last three characters, wanguo ma, as I suspect it
 would in practice, I'd favor it slightly over the three-character
 tongyi ma of U_Chinese2.gif. FWIW. To me, wanguo ma emphasizes the
 multilingual aspect, whereas tongyi ma emphasizes the unifying aspect,
 but it isn't fully apparent, from the name (tongyi ma) alone, what is
 being unified.
 

Well, I'd say a problem with wanguo ma [lit. 'standard myriad-country
code'] is that it would be a better translation of Globalcode, rather
than of Unicode. All in favor of changing the standard name, say aye?

And is it apparent from the name "Unicode" alone that "Uni-" stands for
"Unified" and not, um, "Unicorn"? :-)

tongyi ma seems much more natural, less clunky to me ... but some people
prefer what I think is clunky, so I'm willing to admit that my opinion
of clunkiness may be completely subjective.

Here's the Unicode, courtesy of http://www.wenlin.com/ :

[U+6a19][U+6e96][U+842c][U+570b][U+78bc] biao1zhun3 wan4guo2 ma3
[U+7d71][U+4e00][U+78bc] tong3yi1 ma3

UTF8:

* biao1zhun3 wan4guo2 ma3
* tong3yi1 ma3

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis


Michael, that's great. Could you send the code points? (I couldn't use the
images -- if you can make a 96 x 24 GIF, I can use that).

Thanks,

Mark

- Original Message -
From: "Michael Everson" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, December 12, 2000 09:01
Subject: Re: Transcriptions of Unicode


Ar 07:11 -0800 2000-12-12, scríobh Mark Davis:

ARMENIAN
 BULGARIAN
CHEROKEE
ETHIOPIC
 GREEK
GUJARATI
GURMUKHI
 INUKTITUT
OGHAM
RUNIC
 RUSSIAN
SINHALA
UCAS

See http://www.egt.ie/standards/iso10646/pdf/junikod.pdf


Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis


That matches what I have on
http://www.macchiato.com/unicode/Unicode_transcriptions.html, right?

(circle?)

Mark

- Original Message -
From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
To: "Mark Davis" [EMAIL PROTECTED]; "Unicode List" [EMAIL PROTECTED]
Sent: Thursday, December 14, 2000 11:25
Subject: Re: Transcriptions of Unicode


 Here is Hindi:

 यूिनकोड

 I was convinced that that circle was a mistake, but per my friend the
native
 Hindi speaker: "that circle is right, and that's the char that gives the
 phonetic minor e"

 michka

 - Original Message -
 From: "Mark Davis" [EMAIL PROTECTED]
 To: "Unicode List" [EMAIL PROTECTED]
 Sent: Tuesday, December 12, 2000 7:11 AM
 Subject: Transcriptions of Unicode


  Some people were kind enough to send me extra transcriptions for
 
  http://www.macchiato.com/unicode/Unicode_transcriptions.html
 
  I am still missing confirmation on the Russian and Greek, and (at least
 one
  language in) the following scripts. Any help from native speakers would
be
  appreciated.
 
  ARMENIAN
  BENGALI
  BOPOMOFO
  CHEROKEE
  ETHIOPIC
  GUJARATI
  GURMUKHI
  KANNADA
  KHMER
  LAO
  MALAYALAM
  MONGOLIAN
  MYANMAR
  OGHAM
  ORIYA
  RUNIC
  SINHALA
  SYRIAC
  TAMIL
  TELUGU
  THAANA
  THAI
  TIBETAN
  UCAS
  YI

Re: Transcriptions of Unicode

2000-12-14 Thread Michael \(michka\) Kaplan


Sorry, it was the fault of the machine I was on then, I think. I had
mistyped it (U+0928 after U+093F rather than before it). My friend concured.

michka

- Original Message -
From: "Mark Davis" [EMAIL PROTECTED]
To: "Michael (michka) Kaplan" [EMAIL PROTECTED]; "Unicode List"
[EMAIL PROTECTED]
Sent: Thursday, December 14, 2000 8:01 PM
Subject: Re: Transcriptions of Unicode


 That matches what I have on
 http://www.macchiato.com/unicode/Unicode_transcriptions.html, right?

 (circle?)

 Mark

 - Original Message -
 From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
 To: "Mark Davis" [EMAIL PROTECTED]; "Unicode List"
[EMAIL PROTECTED]
 Sent: Thursday, December 14, 2000 11:25
 Subject: Re: Transcriptions of Unicode


  Here is Hindi:
 
  यूिनकोड
 
  I was convinced that that circle was a mistake, but per my friend the
 native
  Hindi speaker: "that circle is right, and that's the char that gives the
  phonetic minor e"
 
  michka
 
  - Original Message -
  From: "Mark Davis" [EMAIL PROTECTED]
  To: "Unicode List" [EMAIL PROTECTED]
  Sent: Tuesday, December 12, 2000 7:11 AM
  Subject: Transcriptions of Unicode
 
 
   Some people were kind enough to send me extra transcriptions for
  
   http://www.macchiato.com/unicode/Unicode_transcriptions.html
  
   I am still missing confirmation on the Russian and Greek, and (at
least
  one
   language in) the following scripts. Any help from native speakers
would
 be
   appreciated.
  
   ARMENIAN
   BENGALI
   BOPOMOFO
   CHEROKEE
   ETHIOPIC
   GUJARATI
   GURMUKHI
   KANNADA
   KHMER
   LAO
   MALAYALAM
   MONGOLIAN
   MYANMAR
   OGHAM
   ORIYA
   RUNIC
   SINHALA
   SYRIAC
   TAMIL
   TELUGU
   THAANA
   THAI
   TIBETAN
   UCAS
   YI

Re: Transcriptions of Unicode

2000-12-13 Thread 11digitboy


Who needs those mungers? Let's nuke them straight to
HELL. WITH a nuke. Or at least a couple hundred hand
grenades.

| ||\ __/__  |   |  _/_   | ||   /
| _|_  ,--, /   \  /_|  -+- / --- | /
|V T_)| |   |\   |   ||/ _
 \_/   T /  \   /  __/   |   /---  \_/ L/ \


 Sarasvati [EMAIL PROTECTED] wrote:
 Michka wrote:
 
  Ok, it happened again. I can send mail to other
 people and the
  encoding stays intact. Just the Unicode List is
 losing it.
  Does anyone have any ideas on this?
 
 Sarasvati contends that you're probably sending raw
 8-bit mail
 over an SMTP connection without any indication of
 the encoding,
 nor any MIME headers in your message.  The raw message
 that was
 received by Unicode.ORG was _ALREADY_ munged into
 7-bits, so the
 fault does not lie with Unicode.ORG.  Your original
 mail had this
 interesting header in it, which might be of some
 interest...
 
  Received: from 157.54.9.108 by mail5.microsoft.com
 (InterScan E-Mail VirusWall NT); Tue, 12 Dec 2000
 10:20:42 -0800 (Pacific Standard Time)
  Received: by inet-imc-05.redmond.corp.microsoft.com
 with Internet Mail Service (5.5.2651.58)
 id YWS8WTM0; Tue, 12 Dec 2000 10:20:41
 -0800
 
 Probably someone else is munging your mail on its
 way to me.
 
   -- Sarasvati
 

___
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com

Re: Transcriptions of Unicode

2000-12-12 Thread Michael Everson


Ar 07:11 -0800 2000-12-12, scríobh Mark Davis:

ARMENIAN
 BULGARIAN
CHEROKEE
ETHIOPIC
 GREEK
GUJARATI
GURMUKHI
 INUKTITUT
OGHAM
RUNIC
 RUSSIAN
SINHALA
UCAS

See http://www.egt.ie/standards/iso10646/pdf/junikod.pdf


Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

Re: Transcriptions of Unicode

2000-12-12 Thread Michael (michka) Kaplan


Here's Tamil (sorry I did not see this earlier on the list!)




MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



- Original Message -
From: "Mark Davis" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, December 12, 2000 7:11 AM
Subject: Transcriptions of Unicode


 Some people were kind enough to send me extra transcriptions for

 http://www.macchiato.com/unicode/Unicode_transcriptions.html

 I am still missing confirmation on the Russian and Greek, and (at least
one
 language in) the following scripts. Any help from native speakers would be
 appreciated.

 ARMENIAN
 BENGALI
 BOPOMOFO
 CHEROKEE
 ETHIOPIC
 GUJARATI
 GURMUKHI
 KANNADA
 KHMER
 LAO
 MALAYALAM
 MONGOLIAN
 MYANMAR
 OGHAM
 ORIYA
 RUNIC
 SINHALA
 SYRIAC
 TAMIL
 TELUGU
 THAANA
 THAI
 TIBETAN
 UCAS
 YI

Re: Transcriptions of Unicode

2000-12-12 Thread Michael (michka) Kaplan


Hmmm... wonder how the UTF-8 encoding got lost?  I will try one more
time

Mark, let me know if the e-mail to you retained it.




MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Re: Transcriptions of Unicode

2000-12-12 Thread Michael (michka) Kaplan


Ok, it happened again. I can send mail to other people and the encoding
stays intact. Just the Unicode List is losing it. Does anyone have any ideas
on this?

The code points are:

U+0BAF U+0BC2 U+0BA9 U+0BBF U+0B95 U+0BCB U+0B9F U+0BCD

and is the one INFITT (Information Forum for Information Technology in
Tamil) has been using in its recent discussions.

michka


- Original Message -
From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, December 12, 2000 9:58 AM
Subject: Re: Transcriptions of Unicode


 Hmmm... wonder how the UTF-8 encoding got lost?  I will try one more
 time

 Mark, let me know if the e-mail to you retained it.

 


 MichKa

 Michael Kaplan
 Trigeminal Software, Inc.
 http://www.trigeminal.com/

Re: Transcriptions of Unicode

2000-12-12 Thread Shigemichi Yazawa


At Tue, 12 Dec 2000 10:25:59 -0800 (GMT-0800),
Michael (michka) Kaplan [EMAIL PROTECTED] wrote:
 Ok, it happened again. I can send mail to other people and the encoding
 stays intact. Just the Unicode List is losing it. Does anyone have any ideas
 on this?

I think that's because the list server strip off almost all the mail
header information. The server should retain

MIME-Version: 
Content-Type: 

header to allow mail clients to display the message in the right
encoding.

It would be even better if the server retain In-Reply-To: header so
that I can view the messages in thread.

---
Shigemichi Yazawa
[EMAIL PROTECTED]

Re: Transcriptions of Unicode

2000-12-12 Thread Sarasvati


Michka wrote:

 Ok, it happened again. I can send mail to other people and the
 encoding stays intact. Just the Unicode List is losing it.
 Does anyone have any ideas on this?

Sarasvati contends that you're probably sending raw 8-bit mail
over an SMTP connection without any indication of the encoding,
nor any MIME headers in your message.  The raw message that was
received by Unicode.ORG was _ALREADY_ munged into 7-bits, so the
fault does not lie with Unicode.ORG.  Your original mail had this
interesting header in it, which might be of some interest...

 Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall 
NT); Tue, 12 Dec 2000 10:20:42 -0800 (Pacific Standard Time)
 Received: by inet-imc-05.redmond.corp.microsoft.com with Internet Mail Service 
(5.5.2651.58)
id YWS8WTM0; Tue, 12 Dec 2000 10:20:41 -0800

Probably someone else is munging your mail on its way to me.

-- Sarasvati

Re: Transcriptions of Unicode

2000-12-12 Thread Sarasvati


Darlings,

Shigemichi Yazawa wrote:

 I think that's because the list server strip off almost all the mail
 header information. The server should retain
 MIME-Version: 
 Content-Type: 

On the contrary, Sarasvati is a highly discerning stripper,
and certainly does not remove anything so essential as MIME
headers.  If you look at your own message, as massaged, you
will find your MIME headers intact.  And I even know your
ditch-dwelling flagellum-waving protozoan mailer's name.

 MIME-version: 1.0 (generated by EMIKO 1.13.9 - "Euglena tripteris")
 Content-type: text/plain; charset=US-ASCII

Euglena tripteris.  First isolated by E. G. Pringsheim in 1943.
See Sweet Emiko up-close in all her 140-micron glory at:

http://taxa.soken.ac.jp/WWW/PDB/PCD2460/D/07.jpg

Your cheeky,

-- Sarasvati

Re: Transcriptions of Unicode

2000-12-12 Thread Michael \(michka\) Kaplan


Interesting... strange how other people I send e-mail to do not have this
problem?

Let me try one more time. :-)

யூனிகோட்


MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/


- Original Message -
From: "Sarasvati" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, December 12, 2000 10:45 AM
Subject: Re: Transcriptions of Unicode


 Michka wrote:

  Ok, it happened again. I can send mail to other people and the
  encoding stays intact. Just the Unicode List is losing it.
  Does anyone have any ideas on this?

 Sarasvati contends that you're probably sending raw 8-bit mail
 over an SMTP connection without any indication of the encoding,
 nor any MIME headers in your message.  The raw message that was
 received by Unicode.ORG was _ALREADY_ munged into 7-bits, so the
 fault does not lie with Unicode.ORG.  Your original mail had this
 interesting header in it, which might be of some interest...

  Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail
VirusWall NT); Tue, 12 Dec 2000 10:20:42 -0800 (Pacific Standard Time)
  Received: by inet-imc-05.redmond.corp.microsoft.com with Internet
Mail Service (5.5.2651.58)
 id YWS8WTM0; Tue, 12 Dec 2000 10:20:41 -0800

 Probably someone else is munging your mail on its way to me.

 -- Sarasvati

Re: Transcriptions of Unicode

2000-12-12 Thread Mark Leisher



Michael Interesting... strange how other people I send e-mail to do not
Michael have this problem?

It came through this time, even on my stone-age mail reader.

Given a widely used homogeneous system like Windows, I wouldn't be surprised
if the recipients that successfully viewed the original were running Windows
too.
-
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University   school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM  88003-- Robert Bresson

Re: Transcriptions of Unicode

2000-12-12 Thread Michael \(michka\) Kaplan

Ah, I actually change to a new SMTP server, hoping it would be a bit more
advanced. It  appears to be a lot more up to date!

michka

- Original Message -
From: "Mark Leisher" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, December 12, 2000 11:52 AM
Subject: Re: Transcriptions of Unicode

 Michael Interesting... strange how other people I send e-mail to do
not
 Michael have this problem?

 It came through this time, even on my stone-age mail reader.

 Given a widely used homogeneous system like Windows, I wouldn't be
surprised
 if the recipients that successfully viewed the original were running
Windows
 too.
 --
---
 Mark Leisher
 Computing Research LabCinema, radio, television, magazines are
a
 New Mexico State University   school of inattention: people look
without
 Box 30001, Dept. 3CRL seeing, listen without hearing.
 Las Cruces, NM  88003-- Robert Bresson

Re: Transcriptions of Unicode

2000-12-08 Thread Curtis Clark


At 03:01 PM 12/8/00, John H. Jenkins wrote:
Yes, this is really true.  If someone were reading an extended text or an 
entire book in Chinese, they might prefer to see the Chinese glyphs, but 
isolated words, quotations, and short passages are printed with Japanese ones.

This is not unique to Chinese/Japanese. When I learned German many years 
ago, the textbook printed German in Fraktur, so that the students would 
gain experience in reading older German texts. But German quotes in English 
text have almost invariably been in the same face as the English.

-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Biological Sciences Department Voice: (909) 869-4062
California State Polytechnic University  FAX: (909) 869-4078
Pomona CA 91768-4032  USA  [EMAIL PROTECTED]

Re: displaying Unicode text (was Re: Transcriptions of Unicode)

2000-12-07 Thread Erik van der Poel


Mark Davis wrote:
 
 Let's take an example.
 
 - The page is UTF-8.
 - It contains a mixture of German, dingbats and Hindi text.
 - My locale is de_DE.
 
 From your description, it sounds like Modzilla works as follows:
 
 - The locale maps (I'm guessing) to 8859-1
 - 8859 maps to, say Helvetica.
 - The dingbats and Hindi appear as boxes or question marks.
 
 This would be pretty lame, so I hope I misunderstand you!!

Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
you've misunderstood me, but only because I abbreviated so much. Sorry.
Let me try again, with more feeling this time.

Using the example above:

- The locale maps to "x-western" (ja_JP would map to "ja", so I've
prepended "x-" for the "language groups" that don't exist in RFC 1766)

- x-western and CSS' sans-serif map to Arial

- The dingbats appear as dingbats if they are in Unicode and at least
one of the dingbat fonts on the system has a Unicode cmap subtable
(WingDings is a "symbol" font, so it doesn't have such a table), while
the Hindi might display OK on some Windows systems if they have Hindi
support (Mozilla itself does not support any Indic languages yet).

We could support the WingDings font if we add an entry for WingDings to
the following table:

http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#872

We just haven't done that yet.

Basically, Mozilla will look at all the fonts on the system to find one
that contains a glyph for the current character.

The language group and user locale stuff that I mentioned earlier is
only one part of the process -- the part that deals with the user's font
preferences. I'll explain more of the rest of the process:

Mozilla implements CSS2's font matching algorithm:

  http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm

This states that *for each character* in the element, the implementation
is supposed to go down the list of fonts in the font-family property, to
find a font that exists and that contains a glyph for the current
character. Mozilla implements this algorithm to the letter, which means
that fonts are chosen for each character without regard for neighboring
characters (unlike MSIE). This may actually have been a bad decision,
since we sometimes end up with text that looks odd due to font changes.

Anyway, Mozilla's algorithm has the following steps:

1. "User-Defined" font
2. CSS font-family property
3. CSS generic font (e.g. serif)
4. list of all fonts on system
5. transliteration
6. question mark

You can see these steps in the following pieces of code:

http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2642

http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3108

1. "User-Defined" font (FindUserDefinedFont)

We decided to include the User-Defined font functionality in Netscape 6
again. It is similar to the old Netscape 4.X. Basically, if the user
selects this encoding from the View menu, then the browser passes the
bytes through to the font, untouched. This is for charsets that we don't
already support. This step needs to be the first step, since it
overrides everything else.

2. CSS font-family property (FindLocalFont)

If the user hasn't selected User-Defined, we invoke this routine. It
simply goes down the font-family list to find a font that exists and
that contains a glyph for the current character. E.g.:

  font-family: Arial, "MS Gothic", sans-serif;

3. CSS generic font (FindGenericFont)

If the above fails, this routine tries to find a font for the CSS
generic (e.g. sans-serif) that was found in the font-family property, if
any, otherwise it falls back to the user's default (serif or
sans-serif). This is where the font preferences come in, so this is
where we try to determine the language group of the element. I.e. we
take the LANG attribute of this element or a parent element if any,
otherwise the language group of the document's charset, if
non-Unicode-based, otherwise the user's locale's language group.

4. list of all fonts on system (FindGlobalFont)

If the above fails, this routine goes through all fonts on the system,
trying to find one that contains a glyph for the current character.

5. transliteration (FindSubstituteFont)

If we still can't find a font for this character, we try a
transliteration table. For example, the euro is mapped to the 3 ASCIIs
"EUR", which is useful on some Unix systems that don't have the euro
glyph yet. Actually, this transliteration step isn't even implemented on
Windows yet.

6. question mark (FindSubstituteFont)

If we can't find a transliteration, we fall back to the last resort --
the good ol' question mark.

That's it. I hope I didn't abbreviate too much this time!

Erik

Re: Transcriptions of Unicode

2000-12-07 Thread David Starner


On Wed, Dec 06, 2000 at 11:12:24PM -0800, James Kass wrote:
 As for Chinese users searching for Chinese
 strings, Japanese text will most probably be incomprehensible
 regardless of font or mark-up. 

That's true for pretty much every other pair of languages that use the
same script, though.

-- 
David Starner - [EMAIL PROTECTED]
http://dvdeug.dhis.org
"(You see, the best way to solve a problem is to rigorously define it in
terms of other people's problems and then run away quickly.)"
   -- Roland McGrath [EMAIL PROTECTED]

Re: displaying Unicode text (was Re: Transcriptions of Unicode)

2000-12-07 Thread Mark Davis


Thanks! I appreciate the description. My fears were unfounded.

 This states that *for each character* in the element, the implementation
 is supposed to go down the list of fonts in the font-family property, to
 find a font that exists and that contains a glyph for the current
 character.

I agree that this does not produce the optimal results, since one should
have the freedom to select different fonts based on the context of the
character. The above description is much better than a very coarse-grained
approach (like having the entire document or element in the same font), but
needs some more wriggle-room to allow people flexibility.

Mark

- Original Message -
From: "Erik van der Poel" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Thursday, December 07, 2000 00:30
Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode")


 Mark Davis wrote:
 
  Let's take an example.
 
  - The page is UTF-8.
  - It contains a mixture of German, dingbats and Hindi text.
  - My locale is de_DE.
 
  From your description, it sounds like Modzilla works as follows:
 
  - The locale maps (I'm guessing) to 8859-1
  - 8859 maps to, say Helvetica.
  - The dingbats and Hindi appear as boxes or question marks.
 
  This would be pretty lame, so I hope I misunderstand you!!

 Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
 you've misunderstood me, but only because I abbreviated so much. Sorry.
 Let me try again, with more feeling this time.

 Using the example above:

 - The locale maps to "x-western" (ja_JP would map to "ja", so I've
 prepended "x-" for the "language groups" that don't exist in RFC 1766)

 - x-western and CSS' sans-serif map to Arial

 - The dingbats appear as dingbats if they are in Unicode and at least
 one of the dingbat fonts on the system has a Unicode cmap subtable
 (WingDings is a "symbol" font, so it doesn't have such a table), while
 the Hindi might display OK on some Windows systems if they have Hindi
 support (Mozilla itself does not support any Indic languages yet).

 We could support the WingDings font if we add an entry for WingDings to
 the following table:


http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#872

 We just haven't done that yet.

 Basically, Mozilla will look at all the fonts on the system to find one
 that contains a glyph for the current character.

 The language group and user locale stuff that I mentioned earlier is
 only one part of the process -- the part that deals with the user's font
 preferences. I'll explain more of the rest of the process:

 Mozilla implements CSS2's font matching algorithm:

   http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm

 This states that *for each character* in the element, the implementation
 is supposed to go down the list of fonts in the font-family property, to
 find a font that exists and that contains a glyph for the current
 character. Mozilla implements this algorithm to the letter, which means
 that fonts are chosen for each character without regard for neighboring
 characters (unlike MSIE). This may actually have been a bad decision,
 since we sometimes end up with text that looks odd due to font changes.

 Anyway, Mozilla's algorithm has the following steps:

 1. "User-Defined" font
 2. CSS font-family property
 3. CSS generic font (e.g. serif)
 4. list of all fonts on system
 5. transliteration
 6. question mark

 You can see these steps in the following pieces of code:


http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#2642


http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#310
8

 1. "User-Defined" font (FindUserDefinedFont)

 We decided to include the User-Defined font functionality in Netscape 6
 again. It is similar to the old Netscape 4.X. Basically, if the user
 selects this encoding from the View menu, then the browser passes the
 bytes through to the font, untouched. This is for charsets that we don't
 already support. This step needs to be the first step, since it
 overrides everything else.

 2. CSS font-family property (FindLocalFont)

 If the user hasn't selected User-Defined, we invoke this routine. It
 simply goes down the font-family list to find a font that exists and
 that contains a glyph for the current character. E.g.:

   font-family: Arial, "MS Gothic", sans-serif;

 3. CSS generic font (FindGenericFont)

 If the above fails, this routine tries to find a font for the CSS
 generic (e.g. sans-serif) that was found in the font-family property, if
 any, otherwise it falls back to the user's default (serif or
 sans-serif). This is where the font preferences come in, so this is
 where we try to determine the language group of the element. I.e. we
 take the LANG attribute of this element or a parent element if any,
 oth

Re: Transcriptions of Unicode

2000-12-06 Thread addison


But NN6 *does* select a font for characters outside the so-called user's
locale when said characters are in a UTF-8 page. It appears that this
mechanism is somewhat haphazard for CJK unified ideographs: I get a mix of
fonts usually (probably because ja is in my locale "stack" currently and
'zh' and 'ko' are not, so I guess Japanese fonts are preferred for
characters that are in JIS X 208 ??).

AP

===
Addison P. PhillipsPrincipal Consultant
Inter-Locale LLChttp://www.inter-locale.com
Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

+1 408.210.3569 (mobile)  +1 408.904.4762 (fax)
===
Globalization Engineering  Consulting Services

On Mon, 4 Dec 2000, Erik van der Poel wrote:

 Mark Davis wrote:
  
  What wasn't clear from his message
  is whether Mozilla picks a reasonable font if the language is not there.
 
 Sorry about the lack of clarity. When there is no LANG attribute in the
 element (or in a parent element), Mozilla uses the document's charset as
 a fallback. Mozilla has font preferences for each language group. The
 language groups have been set up to have a one-to-one correspondence
 with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja.
 When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
 the language group that contains the user's locale's language.
 
 In other words, Mozilla does not (yet) use the Unicode character codes
 to select fonts. We may do this in the future.
 
 Erik

Re: Transcriptions of Unicode

2000-12-06 Thread James Kass



Erik van der Poel wrote:


 
 The font selection is indeed somewhat haphazard for CJK when there are
 no LANG attributes and the charset doesn't tell us anything either, but
 then, what do you expect in that situation anyway? I suppose we could
 deduce that the language is Japanese for Hiragana and Katakana, but what
 should we do about ideographs? Don't tell me the browser has to start
 guessing the language for those characters. I've had enough of the
 guessing game. We have been doing it for charsets for years, and it has
 led to trouble that we can't back out of now. I think we need to draw
 the line here, and tell Web page authors to mark their pages with LANG
 attributes or with particular fonts, preferrably in style sheets.


A Universal Character Set should not require mark-up/tags.

If the Japanese version of a Chinese character looks different
than the Chinese character, it *is* different.  In many cases,
"variant" does not mean "same".

When limited to BMP code points, CJK unification kind of made
sense.  In light of the new additional planes...

The IRG seems to be doing a fine job.

Best regards,

James Kass.

Re: Transcriptions of Unicode

2000-12-06 Thread John H. Jenkins


At 3:57 PM -0800 12/6/00, James Kass wrote:
A Universal Character Set should not require mark-up/tags.

Au contraire, it's been implicit in the design of Unicode from the 
beginning that markup/tags would be required in certain situations. 

If the Japanese version of a Chinese character looks different
than the Chinese character, it *is* different.  In many cases,
"variant" does not mean "same".

But as a rule, the Japanese and Chinese would disagree with you here. 
Certainly the IRG would disagree.  Few in the west would argue over 
the fundamental unity of Fraktur and Roman variations of the Latin 
alphabet; most of the Chinese/Japanese variations are on that order 
or less.


When limited to BMP code points, CJK unification kind of made
sense.  In light of the new additional planes...

The IRG seems to be doing a fine job.


Here you've really lost me.  The IRG is unifying in plane 2, as well. 
Nobody in the IRG has suggested that we abandon unification for plane 
2.

-- 
=
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Transcriptions of Unicode

2000-12-06 Thread Erik van der Poel


James Kass wrote:
 
 Erik van der Poel wrote:
 
  The font selection is indeed somewhat haphazard for CJK when there are
  no LANG attributes and the charset doesn't tell us anything either, but
  then, what do you expect in that situation anyway? I suppose we could
  deduce that the language is Japanese for Hiragana and Katakana, but what
  should we do about ideographs? Don't tell me the browser has to start
  guessing the language for those characters. I've had enough of the
  guessing game. We have been doing it for charsets for years, and it has
  led to trouble that we can't back out of now. I think we need to draw
  the line here, and tell Web page authors to mark their pages with LANG
  attributes or with particular fonts, preferrably in style sheets.
 
 A Universal Character Set should not require mark-up/tags.
 
 If the Japanese version of a Chinese character looks different
 than the Chinese character, it *is* different.  In many cases,
 "variant" does not mean "same".

I was referring to the CJK Unified Ideagraphs in the range U+4E00 to
U+9FA5. I agree that those codes do not *require* mark-up/tags, but if
the author wishes to have them displayed with a "Japanese font", then
they must indicate the language or specify the font directly. The latter
may be problematic. I don't think it's reasonable to expect a browser to
apply various heuristics to determine the language.

 When limited to BMP code points, CJK unification kind of made
 sense.  In light of the new additional planes...
 
 The IRG seems to be doing a fine job.

Somehow I get the impression that you have more to say, but you just
aren't saying it. Cough it up already. :-)

Erik

Re: Transcriptions of Unicode

2000-12-06 Thread James Kass


Erik van der Poel wrote:

  
   The font selection is indeed somewhat haphazard for CJK when there are
   no LANG attributes and the charset doesn't tell us anything either, but
   then, what do you expect in that situation anyway? I suppose we could
   deduce that the language is Japanese for Hiragana and Katakana, but what
   should we do about ideographs? Don't tell me the browser has to start
   guessing the language for those characters. I've had enough of the
   guessing game. We have been doing it for charsets for years, and it has
   led to trouble that we can't back out of now. I think we need to draw
   the line here, and tell Web page authors to mark their pages with LANG
   attributes or with particular fonts, preferrably in style sheets.
 
  A Universal Character Set should not require mark-up/tags.
 
  If the Japanese version of a Chinese character looks different
  than the Chinese character, it *is* different.  In many cases,
  "variant" does not mean "same".

 I was referring to the CJK Unified Ideagraphs in the range U+4E00 to
 U+9FA5. I agree that those codes do not *require* mark-up/tags, but if
 the author wishes to have them displayed with a "Japanese font", then
 they must indicate the language or specify the font directly. The latter
 may be problematic. I don't think it's reasonable to expect a browser to
 apply various heuristics to determine the language.


I completely agree that it is not reasonable to expect a browser
to guess the language.  Since browsers primarily display
information, the browser doesn't really need to be language-aware
in most cases.  Exceptions like word-breaks for Thai and related
scripts exist, of course.  Even scripts which don't use spaces
or other word breaks can be encoded with the special spacing
variants available in the Unicode Standard, though.

  When limited to BMP code points, CJK unification kind of made
  sense.  In light of the new additional planes...
 
  The IRG seems to be doing a fine job.

 Somehow I get the impression that you have more to say, but you just
 aren't saying it. Cough it up already. :-)


Sorry, I'm trying to learn how to be brief (!) and hoped the
inference would be apparent.  Although the IRG still
considers unification relevant, it seems to me that they
are much tighter now in their definition of 'sameness'
than was previously the case.  Not all of the approx 4
"new" characters in Plane 2 are the names of race horses,
some of them, as far as I can tell, would have been unified
before.

Consider the "teeth" ideograph(s).  (Radical number 211, in
some radical lists.)  Because this is a radical, CJK encoders
can select the specific desired character:  
U+2FD2 for Traditional Chinese
U+2EED for Japanese
U+2EEE for Simplified Chinese

Since anyone encoding U+9F52 might see any of the above
three versions, my opinion is that encoders (authors) would 
wish to explicitly encode their expected character and would
do so whenever they have the option.  I believe that they
should have the option.  The abundance of unassigned code
points offered by additional Unicode planes makes this
possible and would eliminate the need for a browser
(or any other application) to "guess" a language in order
to display material as its authors and users desire.

Best regards,

James Kass.

Re: Transcriptions of Unicode

2000-12-06 Thread John H. Jenkins


At 6:40 PM -0800 12/6/00, James Kass wrote:
Consider the "teeth" ideograph(s).  (Radical number 211, in
some radical lists.)  Because this is a radical, CJK encoders
can select the specific desired character: 
U+2FD2 for Traditional Chinese
U+2EED for Japanese
U+2EEE for Simplified Chinese

Since anyone encoding U+9F52 might see any of the above
three versions, my opinion is that encoders (authors) would
wish to explicitly encode their expected character and would
do so whenever they have the option.

This doesn't reflect, however, the way people actually use these 
ideographs.  By and large, the Japanese reader wants to see them 
drawn with the Japanese glyph, whether or not the originator was 
Chinese.

There are some cases where the specific glyph *does* matter, largely 
in personal names.  (We had a mildly heated discussion this morning 
in the IRG meeting going on about how to show one particular glyph 
for precisely this reason.) By and large, however, it is recognized 
that the glyph differences do *not* affect meaning and should be up 
to the reader, not forced by the originator.

I believe that they
should have the option.  The abundance of unassigned code
points offered by additional Unicode planes makes this
possible and would eliminate the need for a browser
(or any other application) to "guess" a language in order
to display material as its authors and users desire.


But then why not deunify the English and French alphabets?  Or French 
and Polish accents?  Or Fraktur and Italic and Roman styles of Latin?

-- 
=
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: displaying Unicode text (was re: Transcriptions of Unicode)

2000-12-06 Thread James Kass


John H. Jenkins wrote:

 At 3:57 PM -0800 12/6/00, James Kass wrote:
 A Universal Character Set should not require mark-up/tags.
 
 Au contraire, it's been implicit in the design of Unicode from the 
 beginning that markup/tags would be required in certain situations. 


Because of the 65536 character limitation ?  (Which no
longer applies.)
 
 If the Japanese version of a Chinese character looks different
 than the Chinese character, it *is* different.  In many cases,
 "variant" does not mean "same".
 
 But as a rule, the Japanese and Chinese would disagree with you here. 
 Certainly the IRG would disagree.  Few in the west would argue over 
 the fundamental unity of Fraktur and Roman variations of the Latin 
 alphabet; most of the Chinese/Japanese variations are on that order 
 or less.
 

As our Asian friends come on-line, they will hopefully
contribute to the discussion in this regard.  The reason
I suspect that the Japanese would tend to agree is that
Unicode had not been widely accepted by the Japanese
user community.  

Perhaps if Unicode originated elsewhere, we would have 
had to deal with Greek/Latin/Cyrillic unification?  
(And we could say that since the "W" is really a ligature 
of two "V"s, it shouldn't have an explicit encoding...)

 
 When limited to BMP code points, CJK unification kind of made
 sense.  In light of the new additional planes...
 
 The IRG seems to be doing a fine job.
 
 
 Here you've really lost me.  The IRG is unifying in plane 2, as well. 
 Nobody in the IRG has suggested that we abandon unification for plane 
 2.
 

I tried to respond to this in an earlier letter.  We don't 
even have CJK unification in the BMP, witness the blocks
U+8A00 to U+8B9f versus U+8BA0 to U+8C36.  Many of
the characters in the latter block are simplified versions
of the former.

U+8A02/U+8BA2
U+8A03/U+8BA3
U+8A0C/U+8BA7
U+8A41/U+8BC2
etc.

Fraktur and roman are both adaptations of the Latin
script, or stylistic variations just as italic and roman.  
The Japanese writing system is Japanese, but derived 
from Chinese.  As you say, some of the differences
are minimal, perhaps slight variation in stroke order,
but other differences are substantial.  In some cases,
the Japanese version may use a variant of a certain
radical component, or even a different radical.  I said
I think the IRG is doing a fine job because it is such a
monumental task, much progress is being made, and the
results of their work seem to reflect the expectations
of the various user communities involved.

Best regards,

James Kass.

Re: Transcriptions of Unicode

2000-12-04 Thread addison


Hi Mark,

You're right, but I believe what Erik is saying is that you can get
Japanese-looking characters to be *preferred* over Chinese-looking
characters (where fonts drawn in both styles are available) by using a
LANG attribute for a specific page or SPAN. This could increase the
acceptance of using UTF-8 as a page encoding in Asia

Best Regards,

Addison

===
Addison P. PhillipsPrincipal Consultant
Inter-Locale LLChttp://www.inter-locale.com
Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

+1 408.210.3569 (mobile)  +1 408.904.4762 (fax)
===
Globalization Engineering  Consulting Services

On Sat, 2 Dec 2000, Mark Davis wrote:

 Won't Modzilla pick fonts based on character code? The only ones in the list
 that couldn't be deduced from that would be the Yiddish and the Chinese.
 
 Mark
 
 - Original Message -
 From: "Erik van der Poel" [EMAIL PROTECTED]
 To: "Unicode List" [EMAIL PROTECTED]
 Cc: "Unicode List" [EMAIL PROTECTED]
 Sent: Friday, December 01, 2000 22:46
 Subject: Re: Transcriptions of "Unicode"
 
 
  Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use
  the fonts that have been set up for those languages. E.g.:
 
span lang="ja" title="Japanese".../span
 
  Erik
 
  Mark Davis wrote:
  
   Done.
  
   From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
   
I would suggest adding a span title="{insert lang name}"/title
   
 Mark Davis wrote:

  http://www.macchiato.com/unicode/Unicode_transcriptions.html

Re: Transcriptions of Unicode

2000-12-04 Thread Mark Davis


I agree, that is the right thing to do. What wasn't clear from his message
is whether Mozilla picks a reasonable font if the language is not there.
Since NN didn't do this in the past, I was wondering whether that has been
improved.

Mark

- Original Message -
From: [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Monday, December 04, 2000 10:40
Subject: Re: Transcriptions of "Unicode"


 Hi Mark,

 You're right, but I believe what Erik is saying is that you can get
 Japanese-looking characters to be *preferred* over Chinese-looking
 characters (where fonts drawn in both styles are available) by using a
 LANG attribute for a specific page or SPAN. This could increase the
 acceptance of using UTF-8 as a page encoding in Asia

 Best Regards,

 Addison

 ===
 Addison P. PhillipsPrincipal Consultant
 Inter-Locale LLChttp://www.inter-locale.com
 Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

 +1 408.210.3569 (mobile)  +1 408.904.4762 (fax)
 ===
 Globalization Engineering  Consulting Services

 On Sat, 2 Dec 2000, Mark Davis wrote:

  Won't Modzilla pick fonts based on character code? The only ones in the
list
  that couldn't be deduced from that would be the Yiddish and the Chinese.
 
  Mark
 
  - Original Message -
  From: "Erik van der Poel" [EMAIL PROTECTED]
  To: "Unicode List" [EMAIL PROTECTED]
  Cc: "Unicode List" [EMAIL PROTECTED]
  Sent: Friday, December 01, 2000 22:46
  Subject: Re: Transcriptions of "Unicode"
 
 
   Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use
   the fonts that have been set up for those languages. E.g.:
  
 span lang="ja" title="Japanese".../span
  
   Erik
  
   Mark Davis wrote:
   
Done.
   
From: "Michael (michka) Kaplan" [EMAIL PROTECTED]

 I would suggest adding a span title="{insert lang name}"/title

  Mark Davis wrote:
 
   http://www.macchiato.com/unicode/Unicode_transcriptions.html

Re: Transcriptions of Unicode

2000-12-04 Thread Erik van der Poel


Mark Davis wrote:
 
 What wasn't clear from his message
 is whether Mozilla picks a reasonable font if the language is not there.

Sorry about the lack of clarity. When there is no LANG attribute in the
element (or in a parent element), Mozilla uses the document's charset as
a fallback. Mozilla has font preferences for each language group. The
language groups have been set up to have a one-to-one correspondence
with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja.
When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
the language group that contains the user's locale's language.

In other words, Mozilla does not (yet) use the Unicode character codes
to select fonts. We may do this in the future.

Erik

Re: Transcriptions of Unicode

2000-12-04 Thread Erik van der Poel


Mark Davis wrote:
 
 What wasn't clear from his message
 is whether Mozilla picks a reasonable font if the language is not there.

Sorry about the lack of clarity. When there is no LANG attribute in the
element (or in a parent element), Mozilla uses the document's charset as
a fallback. Mozilla has font preferences for each language group. The
language groups have been set up to have a one-to-one correspondence
with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja.
When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
the language group that contains the user's locale's language.

In other words, Mozilla does not (yet) use the Unicode character codes
to select fonts. We may do this in the future.

Erik

Re: Transcriptions of Unicode

2000-12-04 Thread Michael \(michka\) Kaplan


FWIW, IE does not do an absolutely stellar job here, either. Not all Unicode
subranges have fonts automatically assigned, yet it is smart enough if you
bring up the font dialog that lists the fonts which cover the subrange.

Although there was no "lame" button when I pulled up the dialog, selected
Ethiopic, saw two fonts listed but IE did not select either, there SHOULD
have been one. Because it was awfully lame. smart enough to know a font
is needed, smart enough to list the ones that would work, but stupid to just
select one? :-(

I am hoping they address this is in IE 6.0. No one should ever need this
dialog unless they want to override choices. :-)

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

- Original Message -
From: "Erik van der Poel" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Monday, December 04, 2000 10:08 PM
Subject: Re: Transcriptions of "Unicode"


 Mark Davis wrote:
 
  What wasn't clear from his message
  is whether Mozilla picks a reasonable font if the language is not there.

 Sorry about the lack of clarity. When there is no LANG attribute in the
 element (or in a parent element), Mozilla uses the document's charset as
 a fallback. Mozilla has font preferences for each language group. The
 language groups have been set up to have a one-to-one correspondence
 with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja.
 When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
 the language group that contains the user's locale's language.

 In other words, Mozilla does not (yet) use the Unicode character codes
 to select fonts. We may do this in the future.

 Erik

Re: Transcriptions of Unicode

2000-12-02 Thread Mark Davis

Won't Modzilla pick fonts based on character code? The only ones in the list
that couldn't be deduced from that would be the Yiddish and the Chinese.

Mark

- Original Message -
From: "Erik van der Poel" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, December 01, 2000 22:46
Subject: Re: Transcriptions of "Unicode"

 Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use
 the fonts that have been set up for those languages. E.g.:

   span lang="ja" title="Japanese".../span

 Erik

 Mark Davis wrote:

  Done.

  From: "Michael (michka) Kaplan" [EMAIL PROTECTED]

   I would suggest adding a span title="{insert lang name}"/title

Mark Davis wrote:

 http://www.macchiato.com/unicode/Unicode_transcriptions.html

Re: Transcriptions of Unicode

2000-12-02 Thread Mark Davis


By the way, Eric, I got NN6 to run, but it does some wierd things with
pages. Take a look at my homepage  http://www.macchiato.com/ on NN6,
compared to NN4.7 or IE5.5. Also, my javascript converter doesn't work on
it, where it does on NN4.7 and IE5.5

Mark

- Original Message -
From: "Erik van der Poel" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, December 01, 2000 22:46
Subject: Re: Transcriptions of "Unicode"


 Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use
 the fonts that have been set up for those languages. E.g.:

   span lang="ja" title="Japanese".../span

 Erik

 Mark Davis wrote:
 
  Done.
 
  From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
  
   I would suggest adding a span title="{insert lang name}"/title
  
Mark Davis wrote:
   
 http://www.macchiato.com/unicode/Unicode_transcriptions.html

Re: Transcriptions of Unicode

2000-12-01 Thread Tex Texin


Sad to report, my browser (Netscape 4.7) shows the Yiddish as
Daw-key-nu-ye (It's left to right not rtl...)

I am using the Monotype Andale Duospace font.
tex

Mark Davis wrote:
 
 I am interested in collecting transcriptions of the word "Unicode" in
 different scripts (and languages). If you are fluent in a language other
 than Unicode, I'd appreciate any suggestions. What I have so far is at:
 http://www.macchiato.com/unicode/Unicode_transcriptions.html
 
 Mark
 ___
 Mark Davis, IBM Center for Java Technology, Cupertino
 (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED]
 http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014

-- 

--
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.14 Oak Park, Bedford, MA 01730

http://www.Progress.com#1 Embedded Database
http://www.SonicMQ.com #1 Performing JMS Messaging
http://www.ASPconnections.com  #1 provider in the ASP marketplace
http://www.NuSphere.comOpen Source software and services for
MySQL

Globalization Program   
http://www.Progress.com/partners/globalization.htm
---

Re: Transcriptions of Unicode

2000-12-01 Thread Mark Davis

Done.

- Original Message -
From: "Michael (michka) Kaplan" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, December 01, 2000 15:19
Subject: Re: Transcriptions of "Unicode"

 IE 5.0, 5.5, NN 6.0, and the latest build of Mozilla all do the right
thing
 with the word.

 So that would be the fault of your browser choice. :-)

 I would suggest adding a span title="{insert lang name}"/title around
 each lang name, as it will cause IE to show the language name in a tooltip
 when you hover the mouse after a slight delay lets people guess the
 languages and then see if their guesses were right. Always a nice
effect...

 michka

 a new book on internationalization in VB at
 http://www.i18nWithVB.com/

 - Original Message -
 From: "Tex Texin" [EMAIL PROTECTED]
 To: "Unicode List" [EMAIL PROTECTED]
 Cc: "Unicode List" [EMAIL PROTECTED]
 Sent: Friday, December 01, 2000 2:30 PM
 Subject: Re: Transcriptions of "Unicode"

  Sad to report, my browser (Netscape 4.7) shows the Yiddish as
  Daw-key-nu-ye (It's left to right not rtl...)

  I am using the Monotype Andale Duospace font.
  tex

  Mark Davis wrote:

   I am interested in collecting transcriptions of the word "Unicode" in
   different scripts (and languages). If you are fluent in a language
other
   than Unicode, I'd appreciate any suggestions. What I have so far is
at:
   http://www.macchiato.com/unicode/Unicode_transcriptions.html

   Mark
   ___
   Mark Davis, IBM Center for Java Technology, Cupertino
   (408) 777-5850 [fax: 5891], [EMAIL PROTECTED],
[EMAIL PROTECTED]

 http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014

  --

 --
  Tex Texin  Director, International Business
  mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
  Progress Software Corp.14 Oak Park, Bedford, MA 01730

  http://www.Progress.com#1 Embedded Database
  http://www.SonicMQ.com #1 Performing JMS Messaging
  http://www.ASPconnections.com  #1 provider in the ASP marketplace
  http://www.NuSphere.comOpen Source software and services for
  MySQL

  Globalization Program
  http://www.Progress.com/partners/globalization.htm

 --
 -

68 matches

Mail list logo