In a message dated 2001-05-28 9:11:44 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
I fear you have undertaken something hopeless. One could transliterate
U+0429 as SHCH or S^C^ or any number of other things, but that is only
appropriate for Russian. In Bulgarian, the only natural
In a message dated 2001-05-28 9:11:44 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
I fear you have undertaken something hopeless. One could transliterate
U+0429 as SHCH or S^C^ or any number of other things, but that is only
appropriate for Russian. In Bulgarian, the only natural
I apologize for sending the previous message three times. My e-mail client
told me the first two attempts had been unsuccessful.
-Doug Ewell
Fullerton, California
Can someone please help me understand whether support for double byte is the
same as being Unicode compliant. Any elaboration would be greatly
appreciated. If for instance, being Unicode compliant has any additional
value/benefits, etc... I'd like to understand how, why!
Thanks,
Jim Williams
James Williams wrote:
Can someone please help me understand whether support for
double byte is the same as being Unicode compliant.
No.
Double byte normally refers to the national character sets used in China,
Japan and Korea (much older than Unicode). As these languages require
thousands of
On 05/28/2001 05:30:15 AM Doug Ewell wrote:
I know that neither UTC nor WG2 engages in the very controversial business
of
assigning canonical transliterations between scripts
No, but ISO TC46/SC2 does. http://www.elot.gr/tc46sc2/
The goal is to improve an existing program I wrote which
Dear
Unicoders:
Whilesurfingtheneta
linkwithword ASIANmostof the time
leadto
aChinese,Japaneseor
Korean site, is not confusing? Because there
are
many nationsandcountriesin Asia!
But
today
I was more confused, when I opened the Microsoft Word XP
FONT
dialog, it has three Font
On 05/27/2001 08:03:37 PM Jianping Yang wrote:
But it seems to me that we've lived without
Premise B in the past, and that it won't benefit us to adopt it now. Why
bother with it? Why not continue doing what we already know how to do?
As a matter of fact, the surrogate or supplementary
On 05/29/2001 02:02:36 AM James Williams wrote:
Can someone please help me understand whether support for double byte is
the
same as being Unicode compliant.
No.
Any elaboration would be greatly
appreciated.
Oh, you'd like an exaplanation? :-)
Double byte refers to a variety of legacy
On 05/29/2001 05:12:39 PM N.R.Liwal wrote:
I think Calling
CJK specifically Asian is not appropriate nor helpful, because Asia is
big
and have hundreds of languages and scripts, either all Asian Script i.e.
Arabic, Hebrew, Devanagri, Bengali, Thai and etc.. should be
called
Asian
Why are the Braille characters classified as Other Neutrals regarding
bidi? Shouldn't they be Left-to-right? Does any Right-to-Left Braille
exist anywhere in the world?
--roozbeh
Trying to translate an English sentence often causes problems.
Does hurt mean
1. Injure
2. Cause pain to
3. Both?
I believe the intention of the sentence I can eat glass and it doesn't
hurt me is to convey the idea that the speaker is... eccentric, which
would characterize someone who
At 4:39 AM -0700 5/25/01, [EMAIL PROTECTED] wrote:
I thought that Yiddish was a language without a home.
ÅöÇÇǧǢǡÇøÇ·ÇÒÅö
Although Yiddish is one of the best examples of a language without an
army or navy, it is a dialect of Old High German. It was spoken
everywhere that German was,
Billancourt, le 1er avril 2001,
I was thinking about this while reading the thread about UTF-8s.
If the binary order of UTF-16 is of so prime interest that the
(numerous) users of UTF-8 should slightly modify their code
to co-operate with UTF-16-based database engines, by
accepting UTF-8s rather
Roozbeh Pournader wrote:
Does any Right-to-Left Braille exist anywhere in the world?
I know that Hebrew Braille is left-to-right.
_ Marco
In a message dated 2001-05-29 7:10:48 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
I think Calling
CJK specifically Asian is not appropriate nor helpful, because Asia is
big and have hundreds of languages and scripts...
This is certainly a valid point. Far East or East Asian
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
But, if by East Asian you mean languages written with Han ideographs,
you fall in another pitfall, because
Doug wrote:
UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain some
sort of bizarre compatibility with the binary sorting order of UTF-16.
UTC should not, and almost certainly will not, endorse such a proposal on the
part of the database vendors.
I would be loath
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
But, if by East Asian you mean languages written with
Roozbeh asked:
Why are the Braille characters classified as Other Neutrals regarding
bidi?
Because they were all given a general category of So (Symbol Other), and
the default bidi property for So is ON:
2801;BRAILLE PATTERN DOTS-1;So;0;ON;N;
No one spoke out for any
Someone (clearly having Chinese roots) wrote me privately:
But, if by East Asian you mean languages written with
Han ideographs,
you fall in another pitfall, because Mongolian, Russian,
Vietnamese and many
I don't think so, Mongolia is not in East Asia, it's in North Asia.
Russia
Antoine Leca wrote:
Jianping Yang wrote:
As a matter of fact, the surrogate or supplementary character was not defined
in the past,
How long is the past? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.
On 05/29/2001 12:55:19 PM Marco Cimarosti wrote:
But, if by East Asian you mean languages written with Han ideographs,
you fall in another pitfall, because Mongolian, Russian, Vietnamese and
many
other languages spoken in East Asia aren't accounted for.
At least in academic contexts, or at
On Tue, 29 May 2001, Marco Cimarosti wrote:
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
But, if by East Asian you mean languages written with Han
I originally thought could be a way of storing Unicode text in databases.
However, after some thinking, I decided that idea was completely bogus, so I
though to turn it into a joke for geeks. But it wasn't even amusing, so it
went in the Deleted Items folder.
However, I see that illogical ideas
Unicode 3.1 Technical report #15, Annex 7
(http://www.unicode.org/unicode/reports/tr15/#Programming_Language_Ident
ifiers) contains the following remark:
Generally if the programming language has case-sensitive identifiers
then Normalization Form C may be used, while if the programming language
So I suggest to correct the problem before it came out.
And I would like to propose UTF-32s.
I think this has been anticipated, I think by some people who proposed UTF-8S.
My opinion, for what it's worth, is that there should be no new formats.
We have too many of them already, and making
On Tue, 29 May 2001, Marco Cimarosti wrote:
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
But, if by East Asian you mean languages written with Han
Ken,
UTF-8s is essentially a way to ignore surrogate processing. It allows a
company to encode UTF-16 with UCS-2 logic.
The problem is that by not implementing surrogate support you can introduce
subtle errors. For example it is common to break buffers apart into
segments. These segments may
On Tue, 29 May 2001, Jungshik Shin wrote:
On Tue, 29 May 2001, Marco Cimarosti wrote:
Doug Ewell wrote:
Peter has an excellent solution -- much better than trying to
explain the
term CJK to ordinary people -- and I plan to use the term
East Asian in the future.
you fall in
Thomas Chan wrote:
There are many pitfalls. Does the definition exclude Korean when written
solely in Hangul? Is Vietnamese clearly East Asian? How about Yi
(TUS3.0 thinks so)?
Whoa, wait a minute. Let's not extrapolate too much from some pragmatic
decisions that were taken to divide up
On 05/29/2001 02:46:48 PM Achim Ruopp wrote:
Generally if the programming language has case-sensitive identifiers
then Normalization Form C may be used, while if the programming language
has case-insensitive identifiers then Normalization Form KC may be more
appropriate.
If I'm not mistaken
On 05/29/2001 02:37:55 PM Thomas Chan wrote:
I think what one wants is something like languages usually and currently
possibly including Han characters in their written form. That frees us
from worrying about historical or aberrant cases, I think.
Folks, this discussion was about how to label
Marco asked:
I have a question about the file
http://www.unicode.org/Public/UNIDATA/Scripts.txt, the data file for
UTR#24 (Script Names).
I see that script-specific combining characters are normally assigned to
that script. However, a few of them are in the INHERITED class:
Are these
On Tue, 29 May 2001 [EMAIL PROTECTED] wrote:
On 05/29/2001 02:37:55 PM Thomas Chan wrote:
I think what one wants is something like languages usually and currently
possibly including Han characters in their written form. That frees us
from worrying about historical or aberrant cases, I
Carl,
Ken,
UTF-8s is essentially a way to ignore surrogate processing. It allows a
company to encode UTF-16 with UCS-2 logic.
The problem is that by not implementing surrogate support you can introduce
subtle errors. For example it is common to break buffers apart into
segments.
Ken,
I suspect that Oracle is specifically pushing for this standard because of
its unique data base design. In a sense Oracle almost picks it self up by
its own bootstraps. It has always tried to minimize actual code. Therefore
it was a natural choice to implement Unicode with UTF-8 because
So say "Han font" or "Hanzi font".
$B!z$8$e$&$$$C$A$c$s!z(B
EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"
You can just say Screw the number 8, let's use 21-bit bytes.
$B!z$8$e$&$$$C$A$c$s!z(B
EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"
--- Original Message ---
$B:9=P?M(B: "Carl W. Brown"
Actually it would be more accurate to say that geographic expressions
involving cardinal points without an _explicit_ point of reference are
biased, because they traditionally assume that Europe is the _implicit_
point of reference. Hence, Far East, Orient, Near East (or Middle
East) are biased
David Gallardo scripsit:
Actually it would be more accurate to say that geographic expressions
involving cardinal points without an _explicit_ point of reference are
biased, because they traditionally assume that Europe is the _implicit_
point of reference. Hence, Far East, Orient, Near
Please excuse the unintended querulousness, but isn't the Greenwich meridian
merely the reification of this bias?
The Greenwich meridian division was established in 1884 by representatives
from 25 countries, mostly from Europe and the Americas. Though there were,
notably, representatives from
David Gallardo scripsit:
Please excuse the unintended querulousness, but isn't the Greenwich meridian
merely the reification of this bias?
Sure. Ditto the Gregorian calendar, and the decimal digit system, and
many other international standards. But they *are* standards.
--
John Cowan
43 matches
Mail list logo