Re: the Ethnologue

2000-09-13 Thread J%ORG KNAPPEN
Rick McGowan asked: Can anyone point me to an existing list of languages that is more = comprehensive and better researched than the Ethnologue? If there is no = such list, then we don't need to consider any alternatives, right? Ask the closest university department of comparative

Re: surrogate terminology

2000-09-13 Thread Mark Davis
Not all code points are assigned (or even assignable) to characters. U+xx is used to refer to code points, which range from 0 to 10. Of these code points, some are assigned to characters (including regular characters, control characters, format characters, and private use characters

Re: the Ethnologue

2000-09-13 Thread John Hudson
Rick McGowan wrote: One of the major PROBLEMS with ISO 639, and other such lists developed by ISO over the years, is that they are not brought into being, or maintained, with the intent of being comprehensive. They are either intended to, or do serve, some short-term narrow interests.

Re: Ethnologue

2000-09-13 Thread Michael Everson
Ar 09:19 -0800 2000-09-12, scríobh [EMAIL PROTECTED]: First, by the definitions assumed in the Ethnologue, they are all considered to be distinct languages; they would be candidates for separate literacy and literature development (if currently spoken-only), and if literature were to be

Re: the Ethnologue

2000-09-13 Thread Michael Everson
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn: A lot of what are listed as "languages" in the Ethnologue are what most people would call dialects. For instance almost every known dialect of spoken Tibetan is listed as a separate language in the Ethnolouge although they all share only

Re: the Ethnologue

2000-09-13 Thread Misha Wolf
The Library of Congress is very closely involved with ISO 639-2. In fact, it is mostly their list of codes. Misha Oh Michael... I think there are codes given to entities in the Ethnologue list that aren't languages in the sense that we need to identify languages in IT and in

RE: surrogate terminology (was Re: Surrogate support in *ML?

2000-09-13 Thread Marco . Cimarosti
Peter constable wrote: - code values: integers within the space of some encoding form; d800 - dfff *are* code values, but not codepoints - surrogate: I'm inclined to say that this should refer *only* to a UTF-16 code value in the range d800 - dfff; equal to "surrogate code value" -

Tagging orthographic systems (was: (iso639.186) the Ethnologue)

2000-09-13 Thread Otto Stolz
Am 2000-09-12 um 17:43 h UCT hat Peter Constable geschrieben: ISO 639 codes were primarily intended for bibliography purposes. Gary and I point out in our paper that the needs of that sector do not necessarily correspond to the general needs of IT, particularly for language-specific

*New* Spanish Collation

2000-09-13 Thread Nat Langs
Hello, I have heard that recently there has been a change to the Spanish collation, where can I obtain more information on this ? Is this based on a given standard ? Thanks Regards Nat __ Do You Yahoo!? Yahoo! Mail - Free email you can access

Re: *New* Spanish Collation

2000-09-13 Thread Michael \(michka\) Kaplan
Well, there are two collations... the "traditional" one and the "modern" or "international" one. The primary difference between them being that just about all of the exceptions in the traditional one have been elminated in the modern one. This was not a recfent change though, it was several

Re: the Ethnologue

2000-09-13 Thread John Hudson
At 02:10 AM 9/14/2000 -0700, [EMAIL PROTECTED] wrote: The problem here is that ISO639 has, for better or worse, been adopted by a wide array of DIFFERING applications. It's a convenience standard that we vaguely have to live with. No, it's an inconvenience standard that we vaguely have to live

Re: the Ethnologue

2000-09-13 Thread Rick McGowan
Re the Linguasphere, Peter C wrote: - As Chris mentioned, the info isn't available online. Actually, the Linguasphere is available on-line, if you pay for it... One hundred sixty pounds sterling (two hundred seventy-five US dollars) for a license to use the electronic version. Rick

RE: the Ethnologue

2000-09-13 Thread Ayers, Mike
With English, the problem with spell checking is quite different, and different lists of words would not be as easy for a solution: the en-US vs. en-GB tagging does not seem to adequately cover the various differences such as -ise vs. -ize, -our vs. -or, -re vs. -er, use of shall vs.

Tagging orthographic systems (was: (iso639.186) the Ethnologue)

2000-09-13 Thread Rick McGowan
Otto Stolz wrote: I think, the ethnologue lacks information about variant orthographies. Yes, it does. But that's OK, because we can make a composite tagging system that tags orthography separately from language. So... does anyone have a comprehensive list of orthographies? Rick

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread Rick McGowan
Tom Emerson wrote: One (well, the only) problem I have with explicit orthographic tagging is that it makes assumptions that a consistent orthography is being used throughout a document, which isn't necessarily the case. This is particularly prevalent in East Asian languages: Well, the tags

RE: the Ethnologue

2000-09-13 Thread Ayers, Mike
From: Arnt Gulbrandsen [mailto:[EMAIL PROTECTED]] Are there valid reasons why the imperfect but comprehensive needs to be a standard? I can see one reason for it _not_ to be a standard: A list can be added to faster, so it's easier for a list to be truly comprehensive.

Re: surrogate terminology

2000-09-13 Thread Peter_Constable
On 09/13/2000 01:47:57 AM Mark Davis wrote: Not all code points are assigned (or even assignable) to characters. U+xx is used to refer to code points, which range from 0 to 10. Of these code points, some are assigned to characters (including regular characters, control characters,

FWD: SOS...........help..........

2000-09-13 Thread Peter_Constable
- Forwarded by Peter Constable/IntlAdmin/WCT on 09/13/2000 01:21 PM - |+--- || "Sandeep Krishna"| || sandeepkrishna@noida| || .hclt.com | || | |

Re: the Ethnologue

2000-09-13 Thread John Cowan
Michael Everson wrote (amplified by me): tire, civilize, color, center (US) tyre, civilize, colour, centre (GB-Oxonia) tyre, civilise, colour, centre (GB-Demotica) tire, civilise, colour, centre (CA) I have seen a photograph of an actual Canadian sign saying "Tire Centre", which in GB

Re: the Ethnologue

2000-09-13 Thread Misha Wolf
It takes a long time for data to work its way into an ISO standard. This generalisation is unhelpful. Consider ISO 4217, the currency code standard. As soon as the Maintenance Agency (MA) has been notified by a competent authority (in this case, a central bank) of a legitimate currency

Re: *New* Spanish Collation

2000-09-13 Thread Antoine Leca
Antoine Leca wrote 1/4 hour ago: Nat Langs wrote: I have heard that recently there has been a change to the Spanish collation, where can I obtain more information on this ? I am presently unable to direct you to the specific decision (of the reunion of the Academias de la Lengua

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 01:39:37 AM J%ORG KNAPPEN wrote: I once looked at the ethnologue and its subdivision of the german language is just ridiculous. Not small errors, a gross misconception. I don't trust the ethnologue in area where I don't know the fact well, since it fails in one area where I know

Re: *New* Spanish Collation

2000-09-13 Thread Antoine Leca
[ISO-8859-1] Nat Langs wrote: I have heard that recently there has been a change to the Spanish collation, where can I obtain more information on this ? If you are able to read Spanish, look at URL:http://www.rae.es/NIVEL1/LEMAS/ch.htm URL:http://www.rae.es/NIVEL1/NORMAS.HTM I am presently

Re: *New* Spanish Collation

2000-09-13 Thread Michael \(michka\) Kaplan
FWIW, This is indeed the collation supported by LCID 0x0C0A (3082) under Windows, names "Spanish - Modern" under most versions of Windows and "Spanish - International" under Windows 2000. The other collation is "Spanish - Traditional" and its LCID is 0x040A (1034). All functions under Windows

FW: compliance with sql

2000-09-13 Thread Magda Danish (Unicode)
-Original Message- From: Gary Deleel [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 13, 2000 1:35 PM To: [EMAIL PROTECTED] Subject: compliance with sql To you, Basically I have been given the job of finding what problems lie within SQL server when unicode is involved.

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread Rick McGowan
My point is that for some languages there is no single orthography that can ever be nailed down. Yes of course. But there's nothing to prevent the development of a system of orthographic tags, and nothing to prevent combining orthographic tags with language tags for complete mix-and-match

FW: Printing issues

2000-09-13 Thread Magda Danish (Unicode)
-Original Message-From: Dieter Hoffmann [mailto:[EMAIL PROTECTED]]Sent: Wednesday, September 13, 2000 12:43 PMTo: [EMAIL PROTECTED]Subject: Printing issues Dear Unicode People, Are there known issues between the way AMD K6/2 handles Unicode when sent to printer by Office97? In

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 02:17:52 AM John Hudson wrote: The first tasks should be to a) identify the different kinds of information that need to be represented by tags (spoken languages, written languages, literary languages (not the same thing as a written languages), particular orthographies,

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread Kenneth Whistler
Tom Emerson wrote: One (well, the only) problem I have with explicit orthographic tagging is that it makes assumptions that a consistent orthography is being used throughout a document, which isn't necessarily the case. This is particularly prevalent in East Asian languages: Japanese

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
(Apologies for the cross-listing, but this has spanned several lists, and there are parties on each that are not all on one and that are interested in the discussion.) On 09/13/2000 06:37:02 AM Michael Everson wrote: Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn: A lot of what are

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 10:25:21 AM Antoine Leca wrote: While I agree with you, there are anyway problems with the way languages are distinguished... Some comments in response: - This is not primarily about major languages. They generally already have the identifiers they need. In addition, because of

Re: Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 06:37:25 AM Michael Everson wrote: First, by the definitions assumed in the Ethnologue, they are all considered to be distinct languages; they would be candidates for separate literacy and literature development (if currently spoken-only), and if literature were to be developed,

FWD: Unicode Indian languages (was Re: Tamil glyphs)

2000-09-13 Thread Marco Cimarosti
Paresh Agarwal wrote (privately): I am really enthusiastic to know about Unicode in depth. Can you suggest how to go about all this? I learned a lot asking question to the Unicode List ([EMAIL PROTECTED]). There are plenty people there who can answer your questions in detail. All the best _

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread Peter_Constable
On 09/13/2000 10:49:49 AM John Hudson wrote: Would it be too radical to suggest that 'language codes', per se, are one of the least useful things for IT tagging? A blind code, that offers no information about orthography, script variant, or even whether a language is written at all, simply does

Re: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 11:59:01 AM Rick McGowan wrote: Re the Linguasphere, Peter C wrote: - As Chris mentioned, the info isn't available online. Actually, the Linguasphere is available on-line, if you pay for it... One hundred sixty pounds sterling (two hundred seventy-five US dollars) for a license

RE: the Ethnologue

2000-09-13 Thread Peter_Constable
On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote: What I'd really like to know is why there seems to be this insistence on only one official list of languages when there appears to be a clear need for two. There appears to be interest for a comprehensive, if imperfect, list on one hand,

Re: Tagging orthographic systems (was: (iso639.186) the Ethnologue)

2000-09-13 Thread Peter_Constable
On 09/13/2000 09:09:12 AM Otto Stolz wrote: For many language-specific IT processes involving written language, such as spell-checking, hyphenating, transliterating (e. g. to Braille), or audible rendering, it is not enough to know which language you are dealing with: you also need information

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread John Hudson
Tom Emerson wrote: Rick McGowan writes: Well, the tags don't force one to tag an entire document with the same tag. They just provide a space. It's up to the document format to decide whether to tag a whole document or tag by language/orthography runs. That isn't the point I'm trying

Re: Tamil glyphs

2000-09-13 Thread Antoine Leca
Marco Cimarosti wrote: Antoine Leca wrote: I am not sure this is the only way to interpret the use of ZWNJ here. Another way would be to consider the sequence ka+halant to be a separate syllable, and then ka+i to be a second syllable. Then, the correct rendering would be

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-13 Thread John Hudson
At 01:59 PM 9/13/2000 -0800, [EMAIL PROTECTED] wrote: Would it be too radical to suggest that 'language codes', per se, are one of the least useful things for IT tagging? A blind code, that offers no information about orthography, script variant, or even whether a language is written at all,

Re: Tamil glyphs

2000-09-13 Thread Michael \(michka\) Kaplan
From: "Antoine Leca" [EMAIL PROTECTED] Subject: Re: Tamil glyphs We agree this is an area where we really need some light, and a firmer guide of implementation from the Unicode consortium. What is the way to request a more strong rule of interpretation? I will be submitting something for

Re: Tamil glyphs

2000-09-13 Thread Kenneth Whistler
Antoine asked: We agree this is an area where we really need some light, and a firmer guide of implementation from the Unicode consortium. What is the way to request a more strong rule of interpretation? The way to "request" a stronger rule of interpretation is to write a formal contribution

help regarding UNICODE

2000-09-13 Thread mlinguist
I amenthusiastic to know about Unicode font system in depth, specifically with regard to Indian languages.Would anyone suggest how to go about all this? What is the difference between Unicode fonts and other fonts? Are there separate Unicode fonts? If yes, where are they available? Is it

help on Unicode

2000-09-13 Thread Sandeep Krishna
hi friends... we are new to Unicode and we are aware of the basic concepts of Unicode and UTF-8 coding... but as far as the implementation of Unicode encoding on platforms like Visual C++ or Visual Basic are concerned, we are pretty much in the dark.. if someone could help us in this