On Fri, 8 Nov 2002, Magda Danish (Unicode) wrote:
-Original Message-
Date/Time:Fri Nov 8 09:05:40 EST 2002
Contact: [EMAIL PROTECTED]
Report Type: Other Question, Problem, or Feedback
Hello
I just wanted to know how much space in bytes the Latin-1
Wednesday, September 25, 2002
A friend of a friend asked me if Unicode has a code for small s with a
grave. I can't find one; am I overlooking it? Has it been added
since 3.0? Thanks in advance.
Regards,
Jim Agenbroad ( [EMAIL
On Thu, 29 Aug 2002, Eric Muller wrote:
For my personal use, I would like to acquire electronic dictionaries,
principally for the major European languages, with the following
characteristics:
- reputable source
- raw datafiles accessible - I appreciate the interfaces that
dictionary
On Fri, 23 Aug 2002 [EMAIL PROTECTED] wrote:
On 08/23/2002 04:54:58 AM Doug Ewell wrote:
For those who like to keep up on such things, there have been recent
changes to the code lists of two important standards related to
internationalization -- ISO 639 (language codes) and ISO 3166-2
On Tue, 20 Aug 2002, Andrew C. West wrote:
On Tue, 20 August 2002, John Cowan wrote:
It has no sound, but neither does Romance quot;hquot;; both exist as a
marker of
etymology.
But in fact the apostrophe may have a sound in dialectal English, where it is
used to represent a
On Tue, 20 Aug 2002, Michael Everson wrote:
At 10:10 -0700 2002-08-20, Andrew C. West wrote:
On Tue, 20 August 2002, John Cowan wrote:
It has no sound, but neither does Romance quot;hquot;; both
exist as a marker of etymology.
But in fact the apostrophe may have a sound in dialectal
On Fri, 16 Aug 2002, John Cowan wrote:
John Hudson scripsit:
The newish Gagauz Turkish Latin-script orthography derives from both
Turkish and Romanian models. This has led to a peculiar hybrid, in which
the cedilla is used for the s and the commaaccent is used for the t.
ME's
On Thu, 25 Jul 2002, Kenneth Whistler wrote:
[snip]
And the devil is in the details. Looking a bit at your suggestions,
for example:
[snip]
Friday, July 26, 2002
No, God is in the details Ludiwg Mies van der Rohe (1886-1969) said. And
that's
On Wed, 24 Jul 2002, Tex Texin wrote:
John Hudson wrote:
At 08:41 AM 24-07-02, [EMAIL PROTECTED] wrote:
from:Doug Ewell [EMAIL PROTECTED]
subject: Re: The standard disclaimer
James Kass jameskass at worldnet dot att dot net wrote:
However, just
On Tue, 9 Jul 2002, Kenneth Whistler wrote:
David Hopwood wrote:
Marco Cimarosti wrote:
The only difficulty would have been if a pre-existing standard had supported
both precomposed and decomposed encodings of the same combining mark. I don't
On Wed, 10 Jul 2002, Roozbeh Pournader wrote:
On Thu, 27 Jun 2002, Marco Cimarosti wrote:
Encoding the navy's flag alphabet or the Morse code would be exactly doing
this: assigning a code to a code which represents a letter.
BTW, which characters should be used to encode the dot and
On Wed, 10 Jul 2002, John Cowan wrote:
James E. Agenbroad scripsit:
The standards I cited use both
techniques (precomposed and decomposed letter+diacritic) but they don't
allow two ways of creating a single letter+diacritic combination the way
ISO10646/Unicode do.
Even
On Wed, 3 Jul 2002, Michael Everson wrote:
At 11:48 +0100 2002-07-03, Anthony Stone wrote:
I should be very glad if someone could solve the mystery of what
Sanskrit and/or Tibetan characters correspond to the following Unicode
characters:
1883 MONGOLIAN LETTER ALI GALI UBADAMA
1884
On Thu, 27 Jun 2002, Keld Jørn Simonsen wrote:
On Thu, Jun 27, 2002 at 11:59:14AM +0200, Lars Marius Garshol wrote:
This list has previously told me that the characters 0x80 - 0x9F in
ISO 8859-1 are a particular set of control characters from ISO 6429.
[snip]
I now see that ISO
Tuesday, June 4, 2002
Does anyone have a copy of the printed proceedings of the recent
International Unicode Conference held in Dublin that they would be willing
to part with? I could afford U.S. postage costs. Only the CD version is
available from the
On Tue, 30 Apr 2002, Michael Everson wrote:
At 11:55 +0200 2002-04-30, Lars Marius Garshol wrote:
* Stefan Persson
|
| Isn't the reversed lower-case c somewhere in the IPA block?
Could be, but I need reversed lower-case 'c' followed by colon as a
single character.
Also, I am very
On Mon, 22 Apr 2002, Doug Ewell wrote:
Zsigri Gyula [EMAIL PROTECTED] wrote:
How many printable characters are there in Unicode 3.2.0? I tried
desperately to find the answer at the Unicode web site but could
not.
There are 95,156 total assigned characters.
To find the number of
On Fri, 29 Mar 2002, Doug Ewell wrote:
Avarangal [EMAIL PROTECTED] wrote:
I need to allocate a U+codepoint for inherent a, to be used for
Tamil research. Can anyone suggest a temporary location or is it
possible to find such code point within the existing code point
for Tamil.
On Mon, 25 Mar 2002, Markus Scherer wrote:
Chookij Vanatham wrote:
UTR#14:Line Breaking says that, Interpretation of line breaking properties
in bidirectional text takes place before applying rule L1 of the Unicode
Bidirectional Algorithm.
UTR#9:Bidirectional says that, [at the
On Wed, 20 Mar 2002, John Cowan wrote:
John H. Jenkins scripsit:
(His point is that if you have kanji in an IDN you can't tell whether to
draw it the Japanese way or the Chinese way, of course, and since
civilization as we know it depends on Japanese people never being
confronted
On Sun, 17 Mar 2002, Miikka-Markus Alhonen wrote:
On 17-Mar-02 Curtis Clark wrote:
At 04:45 PM 3/16/02, Doug Ewell wrote:
But right away that definition includes not only Shavian, Tengwar,
Cirth, Klingon, and most of the contents of ConScript, but also
Ethiopic, Cherokee, Canadian
On Fri, 15 Mar 2002, Kenneth Whistler wrote:
Dan Kogai continued:
[snip]
His
favorite appears to be ISO-2022 but as Yet Another Perl Encoding Hacker,
ISO-2022 is pain in the arse
You got that right!
--Ken
Monday, March
On Wed, 13 Mar 2002, William Overington wrote:
Here is a system that I think would work.
Consider please that there exists for the private use area the concept of
the hexadecimal point. The term hexadecimal point is similar to the
concept of a decimal point, the difference being that a
On Tue, 12 Mar 2002, John Cowan wrote:
[snip]
(In truth neither of us has had much time to process new registrations
lately. Arse longa, vita brevis.)
[snip]
--
John Cowan [EMAIL PROTECTED] http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,
On Wed, 13 Mar 2002, Michael Everson wrote:
Um,
What I think is that *I* for one am certainly not going to invest any
effort in pseudo-coding scripts in a PreScript Unicode Registry.
The work to get scripts proposed and encoded is enough. If someone is
interested in a script, and wants
On Fri, 8 Mar 2002, Marco Cimarosti wrote:
Peter Constable wrote:
On 03/07/2002 02:16:10 PM James E. Agenbroad wrote:
A similar but not the same situation is found in the fourth
example in
figure 9-3 of Unicode 3.0 (page 214) where an intedpendent
vowel has the
reph
On Fri, 8 Mar 2002, Michael Everson wrote:
At 15:16 -0500 07/03/2002, James E. Agenbroad wrote:
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:
On 03/06/2002 08:25:18 AM Michael Everson wrote:
[snip]
In
Cham, independent vowels can take dependent vowel signs
On Fri, 8 Mar 2002 [EMAIL PROTECTED] wrote:
Jim Agenbroad responded (off list):
Not quite. On page 214 of 3.0 there is one RA vowel, a halant and a
RI
vowel: RA(d) + RI(n) -- RI(n) +RA(sup) ( parens in lieu ofsubscript)
I didn't realise that RI meant the vocalic R. I mistook it
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:
On 03/06/2002 08:25:18 AM Michael Everson wrote:
[snip]
In
Cham, independent vowels can take dependent vowel signs. In
Devanagari, I guess that doesn't occur, but the Brahmic model
shouldn't be understood to preclude this
On Tue, 5 Mar 2002, Doug Ewell wrote:
Dhrubajyoti Banerjee [EMAIL PROTECTED] wrote:
[quoting Akshor]
I thing we need not be restrained by these so-called 'standards'.
Because,
they can't and will not serve our need (Bengali) in my humble view.
Thats
why we toke this project at our
Friday, March 1, 2002
Would I be correct in assuming that the Euro is also now the currency in
non-European dependencies such as the Netherlands Antilles, French
Polynesia, etc.? Apologies in advance if either of these is now
independent.
On Fri, 1 Mar 2002, Patrick Andries wrote:
Marco Cimarosti wrote:
John Cowan wrote:
[...] House numbers in North America (and in France
also, it seems) have a few bits of meaning: the least-significant
(numeric) bit tells you which side of the street the house is on,
[...]
It
On Mon, 25 Feb 2002, Marco Cimarosti wrote:
John Hudson wrote:
At 06:33 2/25/2002, Marco Cimarosti wrote:
Alain LaBonté wrote:
[...] Who knows? What is the word for gipsy in Romanian? [...]
Rom, in fact: I just asked this to a Rumanian colleague.
I presumed this was the
On Tue, 19 Feb 2002, Asmus Freytag wrote:
At 09:52 PM 2/18/02 -0800, Doug Ewell wrote:
So if some language turns out to need
a with horn in the future, its readers will have to cross its fingers
that rendering engines become capable of displaying U+0061 U+031B
properly.
Support for such
Thursday, February 7, 2002
Would making the about to be misled respondent type the address of the
intended person (with a roman 'o', not a greek omicron) and then having
the system see if they match detect and thwart such tricks? The
respondent is
The ALA/LC romanization tables ar at: lcweb.loc.gov/catdir/cpso/roman.html
( not .../romanization.html as in my earlier note)
Sorry,
Jim Agenbroad ( [EMAIL PROTECTED] )
It is not true that people stop pursuing their dreams because they
grow old, they grow old because they
Wednesday, February 6, 2002
The scanned pages of the 1997 ALA/LC romanization tables are now available
on the Web: http://lcweb.loc.gov/catdir/cpso/romanization.html
Note that in lieu of the Wade Giles pages there is a note that pinyin
guidelines are
On Mon, 4 Feb 2002, Michael Everson wrote:
At 12:33 -0800 2002-02-03, Mark Davis wrote:
This has bitten more than a few people. For political reasons, having
to do with the synchronization of names to ISO 10646, the name fields
are empty for the control characters. That is because (at least
Thursday, November 15, 2001
On pages A14-15 the November 9 issue of the Chronicle of Higher Education
has an article Silicon Babylon by Scott LcLemee on the Cunieform Digital
Library Initiative. It seems they're using digital images, not character
On Wed, 7 Nov 2001, Philipp Reichmuth wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello folks,
I've been wondering a little bit recently about the definition of
character vs. glyph variant that is applied during decision
whether or not a given proposed character should go into
On Thu, 27 Sep 2001, John Hudson wrote:
At 02:48 9/27/2001, Marco Cimarosti wrote:
A lot of time ago, someone on this list mentioned a language, written in the
Cyrillic alphabet, which employed letter Q, taken from the Latin alphabet.
Which language is it?
Kurdish. The common Cyrillic
On Wed, 19 Sep 2001, Carl W. Brown wrote:
Ram,
If ISCII is intended as a pan-Indic solution does it also support Urdu?
Carl
Wednesday, September 19, 2001
No, from the foreword to ISCII: As Perso-Arabic scripts have a different
alphabet, a different
On Wed, 19 Sep 2001, Rick McGowan wrote:
If ISCII is still being developed does this suggest that Unicode and its ISO
equivalent move too slowly?
ISCII dates back to 1988 with a revision in 1990. It's not still being
developed -- as far as I know, it's a stable standard that is under
On Tue, 18 Sep 2001, Magda Danish (Unicode) wrote:
-Original Message-
From: Bernard Miller [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 17, 2001 5:19 PM
To: [EMAIL PROTECTED]
Subject: 6 questions
Hello,
These are the questions I wanted to
ask:
1. [snip]
6.
On Thu, 6 Sep 2001, Ayers, Mike wrote:
From: David Starner [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 06, 2001 01:40 PM
On Thu, Sep 06, 2001 at 04:03:07PM +0200, Thierry Sourbier wrote:
The only little thing to know about French and diacritical
mark is that when
On Fri, 6 Jul 2001, Rajesh Chandrakar wrote:
James Kass wrote:
Adarsh wrote:
[snip]
Another problem has to do with searching/indexing. Search/index
applications
are broken by non-Standard encodings.
but how far searching and indexing is possible for encoded standards?
On Tue, 19 Jun 2001, Marco Cimarosti wrote:
Peter Constable wrote:
Can anyone think of other examples of informative properties
that are so
because the property is typical but not true for all languages?
[snip]
I arrived late to this discussion. Is culturally correct
Tuesday, June 12, 2001
Did the Lion dip his thorn in ink?
Jim Agenbroad (discalimer and addresses at bottom)
On Mon, 11 Jun 2001, John Hudson wrote:
At 15:56 6/11/2001 +0100, Michael Everson wrote:
Shaw, Bernard. 1962. Androcles the
Thursday, May 31, 2001
We seem to have strayed from searching for a clearer term than Asian. I
think part of the problem is that many language names are also national
adjectives, e.g., Chinese, Japanese and Korean. Likewise names of scripts
(or
Thursday, May 31, 2001
My goal was never to give a specific number of glyphs needed to display a
particular Indian or other script. As others have pointed out, this
depends among other things, on the particular display device and its font
processing
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.
-- Forwarded message --
Date: Fri, 10 Sep 93 14:12:07 -0400
From: jage (James E. Agenbroad)
To: [EMAIL PROTECTED]
Cc: jage@seq1
Subject: Some Character to Glyph Statistics
Tuesday, May 22, 2001
My recollection is that assigning separate codes to all characters in
Coptic script rather than treating it as part of Greek script was under
consideration at one time. If so, is this effort's current status closer
to
On Tue, 27 Mar 2001, Tony Graham wrote:
At 27 Mar 2001 12:37 -0500, James E. Agenbroad wrote:
On page 125 of the 2000 cumulation of 'Computer literature index' under
the subject heading 'Conversion' the annotation for "Unicode: a primer" by
Tony Graham says: "Unicode i
Tuesday, March 27, 2000
On page 125 of the 2000 cumulation of 'Computer literature index' under
the subject heading 'Conversion' the annotation for "Unicode: a primer" by
Tony Graham says: "Unicode is a programming standard and coding system for
On Fri, 23 Mar 2001, Jonathan Coxhead wrote:
It would be very entertaining to do the same job with the ideographs (down
to the radical level) and count the number of atoms. I suspect the resulting
"character set" would contain less than 2000 atoms altogether.
Please do feel free
Tuesday, March 13, 2001
Those interested in Indic and related scripts might want to consult:
http://www.cs.colostate.edu/~malaiya/scripts.html
[Thats a tilde before malaiya] Not all the links from it are operational
but many are.
Regards,
On Sat, 10 Mar 2001, Jonathan Rosenne wrote:
Regarding Hebrew:
-Original Message-
From: Nick NICHOLAS [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 09, 2001 10:12 PM
To: Unicode List
Cc: Nick NICHOLAS
Subject: Final letters in Hebrew and Arabic
(1) When a letter with a
On Fri, 9 Mar 2001 [EMAIL PROTECTED] wrote:
On 03/09/2001 11:01:53 AM "Tex Texin" wrote:
We have estimates for (human) language usages on the web
Do you mean the number of different languages used on the web? I'd be
curious to know what such estimates are.
- Peter
On Thu, 8 Feb 2001, Michael Everson wrote:
At 04:48 -0800 2001-02-08, J M Sykes quoted the FT:
The International Standards Organisation (ISO) has now agreed to give
standard meanings to these remaining codes.
Which as everyone knows, is really the International Organization for
Wednesday, Januaary 31, 2001
In the chapter on Tibetan in Daniels and Bright's The world's writing
systems (page 434) about prescript symbols: "There are six radicals that
never occur with a prescript: wa, ra, la, ha, and 'a chung." Does anyone
know what the
Friday, January 19, 2001
In what order are ranges of numbers such as 15-23 expressed in a bidi
context? 1. What is wanted visually, if there is one consistent
expectation? 2. Then what order should the codes be stored in Unicode for
the bidi algorithm to
On Wed, 17 Jan 2001 [EMAIL PROTECTED] wrote:
On 01/17/2001 05:13:25 AM Michael Everson wrote:
A + Ldep
No such thing as Ldep in our model, so you'd have to rely on A + virama +
L.
Well, if a script had such behaviour, one possibility could be to propose a
combining CONSONANT
On Thu, 30 Nov 2000, Antoine Leca wrote:
Carl W. Brown wrote:
#3 French also has other articles such as d'.
Yes. But this one, contrary to "l'" can according to the context,
either be the contraction (élidé) of "de", or can be a genuine
part of a proper name... When it comes to
Wednesday, November 14, 2000
Oh I see the long right leg is straight. Sorry.
Regards,
Jim Agenbroad ( [EMAIL PROTECTED] )
The above are purely personal opinions, not necessarily the official
views of any government or any agency of
On Wed, 8 Nov 2000, Apurva Joshi wrote:
The RA[sup] is seen applied to the independent vowel Vocalic R (U+ 090B) in
printed samples in Sanskrit.
There are atleast the following words that contain the above:
NaiRiTa (the name of a demon)
= 0928 090B Ra[sup] 0924
NaiRiTi (the goddess
Thursday, November 8, 2000
After sending a comment on the Ra(sup) + independent vowel discussion two
more general Devanagari questions occurred to me:
1. Is a halant/virama ever valid following other than a consonant (or
consonant and nukta)? My
On Thu, 9 Nov 2000, Rick McGowan wrote:
1. Is a halant/virama ever valid following other than a consonant (or
consonant and nukta)?
Legal? In the sense of "any string is legal", yes; as is anything else.
The implementation question to answer is whether it's useful or
renderable, and
On Tue, 31 Oct 2000, James E. Agenbroad wrote:
On Mon, 30 Oct 2000, Michael (michka) Kaplan wrote:
Most of this happens to be in the Windows NLS database. See GetLocaleInfo in
MSDN for details:
http://msdn.microsoft.com/library/psdk/winbase/nls_34rz.htm
Or more specifically
On Wed, 18 Oct 2000 [EMAIL PROTECTED] wrote:
Jon Babcock wrote:
It seems to me that if not for that, how could anyone
make a Chinese font? Who is going to sit down and
draw a *myriad* or more characters? Since elements
recur, this reduces the amount of labour required
greatly.
I
On Wed, 18 Oct 2000 [EMAIL PROTECTED] wrote:
Doug Ewell wrote:
Marco Cimarosti [EMAIL PROTECTED] wrote:
Carl W. Brown:
An article in the October 12, 2000 issue of Linux Weekly News
http://lwn.net/bigpage.php3 tries to explain the benefit...
Actually, that quote from Linux Weekly
On Tue, 10 Oct 2000, Majid Bhurgri wrote:
On Tues, 10 Oct 2000, Roozbeh Pournader wrote:
It's somehow weird for me, and if it were me, I would have considered it
non-joining. Why would it appear between two letters that would otherwise
join? Arabic cannot be broken between the joining
On Tue, 19 Sep 2000, Mark Davis wrote:
If those can be confirmed, then the SpecialCasing file should be modified to add
them. Could you verify this in time for the next UTC?
Mark
Cathy Wissink wrote:
I believe Azeri also uses the dotless i/dotted i Turkish-style casing.
Cathy
Tuesday, September 12, 2000
Last Friday was International Literacy Day here at LC. SIL was among
those distributing literature here. From it I gather their goal is to
define and implement writing systems for many presently unwritten
languages
On Wed, 23 Aug 2000, Jaap Pranger wrote:
At 18:05 +0200 2000.08.23, James E. Agenbroad wrote:
In a list of Devanagari conjuncts if compiled a while ago there are at
least two cases of conjuncts in which both consonants have a nukta:
1. Ka + nukta + halant + ka + nukta = qqa
2. Ka
On Mon, 31 Jul 2000, Christopher J. Fynn wrote:
Leaving aside implementation costs - has anyone ever come up with a good
estimate of the cost per character for the development of the Unicode / ISO
10646 standards in terms of man hours of experts and their long-suffering
secretaries, the
75 matches
Mail list logo