Re: ü

2002-01-22 Thread Michael Everson

At 20:09 -0500 2002-01-21, Patrick Andries wrote:
Kenneth Whistler wrote:

Patrick Andries wrote:


I must say that I have already seen horrors such as geüpdated (the u
is presumably approximated), again English messing with languages
spelling and pronounciation...

Languages don't mess with languages. People mess with languages.

It isn't as if French hasn't been polluting English for a thousand
years or anything, is it?!

No, no, no. French has enriched English, not polluted it, by 
bestowing it a wealth of new words. I wonder if we could start the 
millenium celebration of this wonderful hybridization before 2066?

Yes. French has given us things like fin de siècle. And English has 
given you le weekend.

And nowadays, the Europeans are getting their revenge by exporting
all their accents back onto English letters.

Well, the Americans are putting a pretty good fight. Can't see the 
light behind façade, cañon and coöperate. Tsk tsk.

Coöperate isn't very common, but naïve is.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Norwegian sorting

2002-01-22 Thread Keld Jørn Simonsen

On Mon, Jan 21, 2002 at 06:14:55PM +0100, Stefan Persson wrote:
 - Original Message -
 From: Lars Marius Garshol [EMAIL PROTECTED]
 To: Unicoders [EMAIL PROTECTED]
 Sent: den 21 januari 2002 15:16
 Subject: Re: Norwegian sorting
 
 
  I doubt that there is an official standard for this, but I would
  expect to find Ü sorted with Y, given that Norwegian Y is pronounced
  just like Swedish/German/Dutch Ü. Many reference works sort V and W
  together, for example, according to the same principle.
 
 Swedish: Ü only used in German loan words.
 German: Ü pronunciated as a Swedish y.
 Dutch: Ü pronunciated completely different.
 
 In Swedish we sort the German ü as y, and the Dutch ü as u.

I have no official record on Dutch ü being sorted as u in Swedish.
Where do you get this rule from? Have you got examples of this?
How do you accomplish it?

Kind regards
Keld Simonsen




Re: Unicode 3.2 Beta Period Finishing

2002-01-22 Thread Michael Everson

Regarding
 
http://www.unicode.org/Public/BETA/Unicode3.2/Scripts-3.2.0d7.txtScripts-3.2.0d7.txt
 
21-Jan-2002 13:5739k

It says:
03D0..03F5; GREEK # L  [38] GREEK BETA SYMBOL..GREEK LUNATE EPSILON SYMBOL

In the first place, 03E2 through 03EF are COPTIC letters, not Greek. 
In the second, some of those letters are technical symbols, not 
letters, so if you are including some of them why not include 03F6?
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Devanagari on MacOS 9.2 and IE 5.1

2002-01-22 Thread [EMAIL PROTECTED]

I spoke to fast. Upon taking a closer look at the file, the font was not set properly. 
MacOS 9.2, Indian Language Kit, Mac IE 5.1 and Devanagari MT as font face seem to 
display UTF-8 encoded Hindi just fine.

Etienne

Date: Mon, 21 Jan 2002 10:24:16 -0800
 [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED], 
[EMAIL PROTECTED]: [EMAIL PROTECTED]
 RE: Devanagari

On this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. 
On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 
and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE 
5.1. Am I correct in assuming that the Mac version of IE does not support Hindi 
without a hack?

Etienne

Reply-To: [EMAIL PROTECTED]
 Christopher J Fynn [EMAIL PROTECTED] [EMAIL PROTECTED]Cc: Aman Chawla 
[EMAIL PROTECTED]
 RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600

Aman

Here in Bhutan the Internet connection is still much worse than in most
places I've visited in India  Nepal (and the cost per minute is several
times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not
display noticeably slower than ASCII, ISCII or 8-bit font encoded pages -
and I don't need to download any special plug-ins or fonts.

- Chris

--
Christopher J Fynn
Thimphu, Bhutan

[EMAIL PROTECTED]
[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of Aman Chawla
 Sent: 21 January 2002 10:57
 To: James Kass; Unicode
 Subject: Re: Devanagari


 - Original Message -
 From: James Kass [EMAIL PROTECTED]
 To: Aman Chawla [EMAIL PROTECTED]; Unicode
 [EMAIL PROTECTED]
 Sent: Monday, January 21, 2002 12:46 AM
 Subject: Re: Devanagari


  25% may not be 300%, but it isn't insignificant.  As you note, if the
  mark-up were removed from both of those files, the percentage of
  increase would be slightly higher.  But, as connection speeds continue
  to improve, these differences are becoming almost minuscule.

 With regards to South Asia, where the most widely used modems are
 approx. 14
 kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
 unheard of, efficiency in data transmission is of paramount importance...
 how can we convince the south asian user to create websites in an encoding
 that would make his client's 14 kbps modem as effective (rather,
 ineffective) as a 4.6 kbps modem?




Hot After Christmas DEALS on just about everything!
http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099





Hot After Christmas DEALS on just about everything!
http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099




RE: The benefit of a symbol for 2 pi

2002-01-22 Thread Sampo Syreeni

On Sat, 19 Jan 2002, Murray Sargent wrote:

Capital pi is to product as capital sigma is to summation.

But if I'm not mistaken, Unicode already has a separate character for
n-ary products and summation (U+220F, U+2211), distinct from the capital
Greek letters *and* the variant forms in the mathematical alphanumeric
block. If capital pi is the way to go, why not use U+1D6F1 MATHEMATICAL
ITALIC CAPITAL PI or U+1D72B MATHEMATICAL BOLD ITALIC CAPITAL PI, for
instance?

Sampo Syreeni, aka decoy - mailto:[EMAIL PROTECTED], tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2





Re: Norwegian sorting

2002-01-22 Thread Keld Jørn Simonsen

On Mon, Jan 21, 2002 at 11:11:43AM -0500, Tex Texin wrote:
 Thanks Keld, that was one of the sources I checked first.
 
 I saw that it was based on a Norwegian standard, but it didn't say what
 the standard was used for. So I didn't know if this was a collation that
 dictionaries or phone books used, or who used it.

NS4103 is normal sorting and filing rules, in the style that
was standardized 30 years ago. NS 4103 is available so you can
check it yourself.

Kind regards
Keld




RE: Devanagari

2002-01-22 Thread Marco Cimarosti

David Starner wrote:
 On Mon, Jan 21, 2002 at 02:20:17PM +0100, Marco Cimarosti wrote:
  What this means in practice for website developers is:
  
  1) SCSU text can only be edited with a text editor which 
 properly decodes
  the *whole* file on load and re-encodes it on save. On the 
 other hand, UTF-8
  text can also be edited using an encoding-unaware editor, 
 although non-ASCII
  text is invisible.
 
 True for users of Latin-based writing systems. Probably of little
 comfort to users of Indic or Chinese-based writing systems.

I was referring to the task of editing *source* files in HTML, XML, or other
computer languages and format. Most of the time, programmers and webmasters
are interested in changing the ASCII part of the file (mark-up,
instructions), which is the part which most likely contains bugs to be
fixed, or to need changes unrelated with the linguistic contents.

Of course, the people in charge of writing the *content*, need tools that
can display the actual characters. And this is true for users of Latin-based
writing system as well: imagine writing in French or German with all
occurrences of é, è, ä, ö, ü, etc. transformed into pairs of funny bytes.

 Better to stick with editors that are aware of your encoding.

Of course. Provided that one exists on your platform, and that you are not
bound to development tools which don't support it.

  2) SCSU text cannot be built by assembling binary pieces coming from
  external sources.
 
 It's not really designed for that. If you're assembling things, just
 run the output through a UTF-8 to SCSU converter.

Which translates to: SCSU is not appropriate for dynamic HTML pages, or for
encoding text inside any other kind of application.

More generally, SCSU is not appropriate as text encoding, but just as a
compression method for documents in their final form.

Ciao.
_ Marco




Re: RE: ü

2002-01-22 Thread Patrick Andries





Marco Cimarosti wrote:
27E7FB58F42CD5119C0D0002557C0CCA16B44F@XCHANGE">
  Patrick Andries wrote:
  Funny: I have just read a similar but opposite opinion on an Italiannewsgroup. Somebody said: if really we must accept English terms such as"file" or "window", we should at list do the effort of pronouncing themaccording to Italian spelling: /'file/ and /vin'dOv/, rather than /'fail/ or/'windo:/.
  
It is an alternate way of doing. In fact, I believe in a middle way : spell
the word as they are pronounced in your language (which is usually not the
same as the original, very few Germans pronounce English loan-words in German
as native English speakers would (even assuming the wealth of English pronunciations).
  27E7FB58F42CD5119C0D0002557C0CCA16B44F@XCHANGE">

  A way to say welcome.
  
  Uhmm... I hope such way of saying welcome will never be applied to humans.In the case I move to China, I would not like to have my hair painted blackand my eyes shape modified with surgery.  :-)
  
Remember the old adage : when in Rome...
  
Patrick
  
  
  


Re: Unicode 3.2 Beta Period Finishing

2002-01-22 Thread Rick McGowan

Doug Ewell reported:

 Many of the embedded images in the Standardized Variants
 document are missing.

The missing images have been fixed.

Rick




Re: Unicode 3.2 Beta Period Finishing

2002-01-22 Thread Mark Davis \(jtcsv\)

Currently, the Coptic characters are treated as extensions to the Greek
script, much as the Urdu characters are treated as extensions to the Arabic
script. So for now, at least, they should be marked as Greek. If the UTC and
SC2 ever disunify the scripts, then the Script property value would need to
change.

As to the technical symbols, would anyone take a stab at listing those
characters that are only ever used as technical symbols, and never as
letters?

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, January 22, 2002 05:10
Subject: Re: Unicode 3.2 Beta Period Finishing


 Regarding
 

http://www.unicode.org/Public/BETA/Unicode3.2/Scripts-3.2.0d7.txtScripts-
3.2.0d7.txt
 21-Jan-2002 13:5739k

 It says:
 03D0..03F5; GREEK # L  [38] GREEK BETA SYMBOL..GREEK LUNATE EPSILON
SYMBOL

 In the first place, 03E2 through 03EF are COPTIC letters, not Greek.
 In the second, some of those letters are technical symbols, not
 letters, so if you are including some of them why not include 03F6?
 --
 Michael Everson *** Everson Typography *** http://www.evertype.com







Re: Devanagari on MacOS 9.2 and IE 5.1

2002-01-22 Thread Yung-Fong Tang



It should be fine also on Netscape 6.2 

[EMAIL PROTECTED] wrote:
[EMAIL PROTECTED]">
  I spoke to fast. Upon taking a closer look at the file, the font was not set properly. MacOS 9.2, Indian Language Kit, Mac IE 5.1 and Devanagari MT as font face seem to display UTF-8 encoded Hindi just fine.Etienne
  
Date: Mon, 21 Jan 2002 10:24:16 -0800"[EMAIL PROTECTED]" [EMAIL PROTECTED] [EMAIL PROTECTED], [EMAIL PROTECTED]: [EMAIL PROTECTED]RE: DevanagariOn this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE 5.1. Am I correct in assuming that the Mac version of IE does not support Hindi without a hack?Etienne

  Reply-To: [EMAIL PROTECTED]"Christopher J Fynn" [EMAIL PROTECTED] [EMAIL PROTECTED]Cc: "Aman Chawla" [EMAIL PROTECTED]RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600AmanHere in Bhutan the Internet connection is still much worse than in mostplaces I've visited in India  Nepal (and the cost per minute is severaltimes higher) - believe me even then UTF-8 (or UTF-16) encoded pages do notdisplay noticeably slower than ASCII, ISCII or 8-bit font encoded pages -and I don't need to download any special plug-ins or fonts.- Chris--Christopher J FynnThimphu, Bhuta
n[EMAIL PROTECTED][EMAIL PROTECTED]
  
-Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]OnBehalf Of Aman ChawlaSent: 21 January 2002 10:57To: James Kass; UnicodeSubject: Re: Devanagari- Original Message -From: "James Kass" [EMAIL PROTECTED]To: "Aman Chawla" [EMAIL PROTECTED]; "Unicode"[EMAIL PROTECTED]Sent: Monday, January 21, 2002 12:46 AMSubject: Re: Devanagari

  25% may not be 300%, but it isn't insignificant.  As you note, if themark-up were removed from both of those files, the percentage ofincrease would be slightly higher.  But, as connection speeds continueto improve, these differences are becoming almost minuscule.
  
  With regards to South Asia, where the most widely used modems areapprox. 14kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostlyunheard of, efficiency in data transmission is of paramount importance...how can we convince the south asian user to create websites in an encodingthat would make his client's 14 kbps modem as effective (rather,ineffective) as a 4.6 kbps modem?
  
  
  Hot After Christmas DEALS on just about everything!http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099
  
  Hot After Christmas DEALS on just about everything!http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099