Rick McGowan asked:
Can anyone point me to an existing list of languages that is more =
comprehensive and better researched than the Ethnologue? If there is no =
such list, then we don't need to consider any alternatives, right?
Ask the closest university department of comparative
Not all code points are assigned (or even assignable) to characters. U+xx
is used to refer to code points, which range from 0 to 10. Of these code
points, some are assigned to characters (including regular characters, control
characters, format characters, and private use characters
Rick McGowan wrote:
One of the major PROBLEMS with ISO 639, and other such lists developed by
ISO over the years, is that they are not brought into being, or maintained,
with the intent of being comprehensive. They are either intended to, or do
serve, some short-term narrow interests.
Ar 09:19 -0800 2000-09-12, scríobh [EMAIL PROTECTED]:
First, by the definitions assumed in the Ethnologue, they are all
considered to be distinct languages; they would be candidates for separate
literacy and literature development (if currently spoken-only), and if
literature were to be
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn:
A lot of what are listed as "languages" in the Ethnologue are what most people
would call dialects. For instance almost every known dialect of spoken Tibetan
is listed as a separate language in the Ethnolouge although they all share
only
The Library of Congress is very closely involved with ISO 639-2.
In fact, it is mostly their list of codes.
Misha
Oh Michael...
I think there are codes given to entities in the Ethnologue list that
aren't languages in the sense that we need to identify languages in IT
and in
Peter constable wrote:
- code values: integers within the space of some encoding
form; d800 - dfff
*are* code values, but not codepoints
- surrogate: I'm inclined to say that this should refer
*only* to a UTF-16
code value in the range d800 - dfff; equal to "surrogate code value"
-
Am 2000-09-12 um 17:43 h UCT hat Peter Constable geschrieben:
ISO 639 codes were primarily intended for bibliography purposes.
Gary and I point out in our paper that the needs of that sector do
not necessarily correspond to the general needs of IT, particularly
for language-specific
Hello,
I have heard that recently there has been a change to
the Spanish collation, where can I obtain more
information on this ? Is this based on a given
standard ?
Thanks
Regards
Nat
__
Do You Yahoo!?
Yahoo! Mail - Free email you can access
Well, there are two collations... the "traditional" one and the "modern" or
"international" one. The primary difference between them being that just
about all of the exceptions in the traditional one have been elminated in
the modern one.
This was not a recfent change though, it was several
At 02:10 AM 9/14/2000 -0700, [EMAIL PROTECTED] wrote:
The problem here is that ISO639 has, for better or worse, been adopted by
a wide array of DIFFERING applications. It's a convenience standard that
we vaguely have to live with.
No, it's an inconvenience standard that we vaguely have to live
Re the Linguasphere, Peter C wrote:
- As Chris mentioned, the info isn't available online.
Actually, the Linguasphere is available on-line, if you pay for it... One hundred
sixty pounds sterling (two hundred seventy-five US dollars) for a license to use the
electronic version.
Rick
With English, the problem with spell checking is quite
different, and different
lists of words would not be as easy for a solution: the en-US
vs. en-GB
tagging does not seem to adequately cover the various
differences such as
-ise vs. -ize, -our vs. -or, -re vs. -er, use of shall vs.
Otto Stolz wrote:
I think, the ethnologue lacks information about variant orthographies.
Yes, it does. But that's OK, because we can make a composite tagging system that tags
orthography separately from language.
So... does anyone have a comprehensive list of orthographies?
Rick
Tom Emerson wrote:
One (well, the only) problem I have with explicit orthographic tagging
is that it makes assumptions that a consistent orthography is being
used throughout a document, which isn't necessarily the case. This is
particularly prevalent in East Asian languages:
Well, the tags
From: Arnt Gulbrandsen [mailto:[EMAIL PROTECTED]]
Are there valid reasons why the imperfect but comprehensive
needs to be a
standard? I can see one reason for it _not_ to be a standard:
A list can
be added to faster, so it's easier for a list to be truly
comprehensive.
On 09/13/2000 01:47:57 AM Mark Davis wrote:
Not all code points are assigned (or even assignable) to characters.
U+xx
is used to refer to code points, which range from 0 to 10. Of these
code
points, some are assigned to characters (including regular characters,
control
characters,
- Forwarded by Peter Constable/IntlAdmin/WCT on 09/13/2000 01:21 PM
-
|+---
|| "Sandeep Krishna"|
|| sandeepkrishna@noida|
|| .hclt.com |
|| |
|
Michael Everson wrote (amplified by me):
tire, civilize, color, center (US)
tyre, civilize, colour, centre (GB-Oxonia)
tyre, civilise, colour, centre (GB-Demotica)
tire, civilise, colour, centre (CA)
I have seen a photograph of an actual Canadian sign saying "Tire Centre",
which in GB
It takes a long time for data to work its way into an ISO standard.
This generalisation is unhelpful. Consider ISO 4217, the currency code
standard. As soon as the Maintenance Agency (MA) has been notified by a
competent authority (in this case, a central bank) of a legitimate
currency
Antoine Leca wrote 1/4 hour ago:
Nat Langs wrote:
I have heard that recently there has been a change to
the Spanish collation, where can I obtain more
information on this ?
I am presently unable to direct you to the specific
decision (of the reunion of the Academias de la
Lengua
On 09/13/2000 01:39:37 AM J%ORG KNAPPEN wrote:
I once looked at the ethnologue and its subdivision of the german language
is just ridiculous. Not small errors, a gross misconception. I don't trust
the ethnologue in area where I don't know the fact well, since it fails in
one
area where I know
[ISO-8859-1]
Nat Langs wrote:
I have heard that recently there has been a change to
the Spanish collation, where can I obtain more
information on this ?
If you are able to read Spanish, look at
URL:http://www.rae.es/NIVEL1/LEMAS/ch.htm
URL:http://www.rae.es/NIVEL1/NORMAS.HTM
I am presently
FWIW, This is indeed the collation supported by LCID 0x0C0A (3082) under
Windows, names "Spanish - Modern" under most versions of Windows and
"Spanish - International" under Windows 2000.
The other collation is "Spanish - Traditional" and its LCID is 0x040A
(1034).
All functions under Windows
-Original Message-
From: Gary Deleel [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 13, 2000 1:35 PM
To: [EMAIL PROTECTED]
Subject: compliance with sql
To you,
Basically I have been given the job of finding what problems lie
within SQL
server when unicode is involved.
My point is that for some languages there is no single
orthography that can ever be nailed down.
Yes of course. But there's nothing to prevent the development of a system of
orthographic tags, and nothing to prevent combining orthographic tags with language
tags for complete mix-and-match
-Original Message-From: Dieter Hoffmann
[mailto:[EMAIL PROTECTED]]Sent: Wednesday, September 13, 2000 12:43
PMTo: [EMAIL PROTECTED]Subject: Printing
issues
Dear Unicode People,
Are there known issues between the way AMD K6/2
handles Unicode when sent to printer by Office97?
In
On 09/13/2000 02:17:52 AM John Hudson wrote:
The first
tasks should be to a) identify the different kinds of information that
need
to be represented by tags (spoken languages, written languages, literary
languages (not the same thing as a written languages), particular
orthographies,
Tom Emerson wrote:
One (well, the only) problem I have with explicit orthographic tagging
is that it makes assumptions that a consistent orthography is being
used throughout a document, which isn't necessarily the case. This is
particularly prevalent in East Asian languages:
Japanese
(Apologies for the cross-listing, but this has spanned several lists, and
there are parties on each that are not all on one and that are interested
in the discussion.)
On 09/13/2000 06:37:02 AM Michael Everson wrote:
Ar 23:56 +0100 2000-09-12, scríobh Christopher J. Fynn:
A lot of what are
On 09/13/2000 10:25:21 AM Antoine Leca wrote:
While I agree with you, there are anyway problems with the way languages
are distinguished...
Some comments in response:
- This is not primarily about major languages. They generally already have
the identifiers they need. In addition, because of
On 09/13/2000 06:37:25 AM Michael Everson wrote:
First, by the definitions assumed in the Ethnologue, they are all
considered to be distinct languages; they would be candidates for
separate
literacy and literature development (if currently spoken-only), and if
literature were to be developed,
Paresh Agarwal wrote (privately):
I am really enthusiastic to know about Unicode in depth.
Can you suggest how to go about all this?
I learned a lot asking question to the Unicode List ([EMAIL PROTECTED]).
There are plenty people there who can answer your questions in detail.
All the best
_
On 09/13/2000 10:49:49 AM John Hudson wrote:
Would it be too radical to
suggest that 'language codes', per se, are one of the least useful things
for IT tagging? A blind code, that offers no information about
orthography,
script variant, or even whether a language is written at all, simply does
On 09/13/2000 11:59:01 AM Rick McGowan wrote:
Re the Linguasphere, Peter C wrote:
- As Chris mentioned, the info isn't available online.
Actually, the Linguasphere is available on-line, if you pay for it... One
hundred sixty pounds sterling (two hundred seventy-five US dollars) for a
license
On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote:
What I'd really like to know is why there seems to be this
insistence on only one official list of languages when there appears to be
a
clear need for two. There appears to be interest for a comprehensive, if
imperfect, list on one hand,
On 09/13/2000 09:09:12 AM Otto Stolz wrote:
For many language-specific IT processes involving written language,
such as spell-checking, hyphenating, transliterating (e. g. to Braille),
or audible rendering, it is not enough to know which language you are
dealing with: you also need information
Tom Emerson wrote:
Rick McGowan writes:
Well, the tags don't force one to tag an entire document with the
same tag. They just provide a space. It's up to the document
format to decide whether to tag a whole document or tag by
language/orthography runs.
That isn't the point I'm trying
Marco Cimarosti wrote:
Antoine Leca wrote:
I am not sure this is the only way to interpret the use of ZWNJ here.
Another way would be to consider the sequence ka+halant to be
a separate syllable, and then ka+i to be a second syllable. Then,
the correct rendering would be
At 01:59 PM 9/13/2000 -0800, [EMAIL PROTECTED] wrote:
Would it be too radical to
suggest that 'language codes', per se, are one of the least useful things
for IT tagging? A blind code, that offers no information about
orthography,
script variant, or even whether a language is written at all,
From: "Antoine Leca" [EMAIL PROTECTED]
Subject: Re: Tamil glyphs
We agree this is an area where we really need some light, and a firmer
guide
of implementation from the Unicode consortium. What is the way to request
a more strong rule of interpretation?
I will be submitting something for
Antoine asked:
We agree this is an area where we really need some light, and a firmer guide
of implementation from the Unicode consortium. What is the way to request
a more strong rule of interpretation?
The way to "request" a stronger rule of interpretation is to write a
formal contribution
I amenthusiastic to know about Unicode font system in depth,
specifically with regard to Indian languages.Would anyone suggest how to
go about all this?
What is the difference between Unicode fonts and other fonts? Are there
separate Unicode fonts? If yes, where are they available? Is it
hi friends...
we are new to Unicode and we are aware of the
basic concepts of Unicode and UTF-8 coding...
but as far as the implementation of Unicode
encoding on platforms like Visual C++ or
Visual Basic are concerned,
we are pretty much in the dark..
if someone could help us in this
44 matches
Mail list logo