28L;56L;84L;112L;140L;168L;196L;224L;252L;280L;308L;336L;Rampshot asked...
> How do I look up a han character if I don't know its codepoint?
> What if all I have is its shape, or its EUC-JP or Shift-JIS number?
> There are a couple I want to see.
The people at Sanseido have just now made it
Will someone PLEASE send this boy a book!?
iRck
Begin forwarded message:
From: [EMAIL PROTECTED]
Date: Sat, 01 Jul 2000 02:49:30 -0800 (GMT-0800)
To: Unicode List [EMAIL PROTECTED]
Subject: Furigana codes?
X-UML-Sequence: 14481 (2000-07-01 10:49:31 GMT)
Are there
There are lots of Unixes:
http://www.columbia.edu/kermit/unix.html
How many of them have an iconv function?
rangda 47: man iconv
man: no entry for iconv in the manual.
rangda 48: cat /etc/motd
Welcome to Darwin!
rangda 49: well, hmmm...
zsh: command not found: well,
rangda 50:
BTW, did anyone get the smileys right at the first sight?
--roozbeh
Yes, the mail viewer here supports UTF-8. Therefore, I saw two glyphs from Apple's
"Last Resort" font which tells me that I don't have any other installed font capable
of displaying the smiley faces... Bummer. :-(
(BTW, I
I do not suppose that characters of 128+ strokes are indeed
possible, due to the fact that the paper would get quite soggy
from the repeated strokes.
Well, if they get soggy on little paper just write 'em on bigger paper!
In any case, your supposition is not adequately informed. For
Would anyone like to please translate that into Chinese for the benefit of future
generations?
Rick
1) The UTF whose bits can be counted is not the eternal UTF.
OK - Is there any currently available solution that permits the unique
Maltese characters 'gh' and 'ie' to be represented?
But... you already use "gh" and "ie" digraphs to represent these letters of the
alphabet, and if you have any software capable of processing the data, then it must be
Marco said:
These are at most the building blocks for braille. A better parallel
would be to consider these "presentation glyphs" for braille. (But I think
that the main reason why these patterns are in Unicode is to encode runs of
braille-looking characters in didactic texts for *sighted*
I bet few things would be rarer than,
say, a Georgian female rap CD in the US!!
Tobacco chewing killer whales in Picadilly Circus, surely.
Would somebody PLEASE tell me, IN THE DEFAULT UNICODE
COLLATION ALGORITHM, WHAT COMES AFTER WHAT?!
Read the technical report! (It's available
Mark said...
If some noble soul volunteers to act as a sports reporter, I'm sure we
can work up something. It's probably a bit much to web-cam it, but that
may come in the future.
Well wait a minute!...
If someone posts the answers, then we can't re-use the same questions next year!
Erik Lindberg asked...
Unicode seems to suggest using the combination: 0BA3+0BBE (NNA+AA).
However the resulting representation of the digraph is not the one
found in literature.
What system are you running on? Whose font? Which application(s)?
There are other characters too in the Tamil
"Michael (michka) Kaplan" [EMAIL PROTECTED] wrote:
Actually, Apurva just did explain it and since she comes from a
typography background she did explain how the whole problem can be handled
via fonts. :-)
Yes, thanks. I saw the explanation after...
However, it cannot currently be
Oh Michael...
I think there are codes given to entities in the Ethnologue list that
aren't languages in the sense that we need to identify languages in IT
and in Bibliography
ISO 639, and every other "standard" for language/locale codes also has this problem,
and from what I remember of the
Re the Linguasphere, Peter C wrote:
- As Chris mentioned, the info isn't available online.
Actually, the Linguasphere is available on-line, if you pay for it... One hundred
sixty pounds sterling (two hundred seventy-five US dollars) for a license to use the
electronic version.
Rick
Otto Stolz wrote:
I think, the ethnologue lacks information about variant orthographies.
Yes, it does. But that's OK, because we can make a composite tagging system that tags
orthography separately from language.
So... does anyone have a comprehensive list of orthographies?
Rick
Tom Emerson wrote:
One (well, the only) problem I have with explicit orthographic tagging
is that it makes assumptions that a consistent orthography is being
used throughout a document, which isn't necessarily the case. This is
particularly prevalent in East Asian languages:
Well, the tags
My point is that for some languages there is no single
orthography that can ever be nailed down.
Yes of course. But there's nothing to prevent the development of a system of
orthographic tags, and nothing to prevent combining orthographic tags with language
tags for complete mix-and-match
Where have you erred? That page isn't encoded in UTF-8! Setting your browser to
interpret the page as UTF-8 won't work if the page isn't in UTF-8. The page appears
to be in 8859-1, but doesn't actually say... I would have figured that Yahoo would
have charset headers, but I don't see one
Yup, I think Otto is right... Just nodding my agreement with the trend...
2311 Square Lozenge
Best wishes,
Otto Stolz
The only exception I am aware of to this rule is
the OmniWeb application which runs only on Mac OS X.
One dis-advantage of OmniWeb, by the way, for international use, is that it requires
that you set (in a preference panel) the encoding it uses for pages that it renders;
it doesn't know
Mike Ayers wrote:
I am aware that there are European languages (swiss and italian?)
that group four digits, and am reasonably sure that japanese does.
Japanese? I don't think so.
Rick
1. Is a halant/virama ever valid following other than a consonant (or
consonant and nukta)?
Legal? In the sense of "any string is legal", yes; as is anything else. The
implementation question to answer is whether it's useful or renderable, and if so, how.
The independent vowel followed by
[EMAIL PROTECTED] wrote:
Unfortunately, there's no corresponding LATIN CAPITAL LETTER N WITH LONG
RIGHT LEG, which Lakota needs.
To my knowledge, the discussion in September between John Cowan and Curtis Clark
didn't terminate with any actual proposal, and I'm not clear on whether the above
Mike Ayers wrote:
The last I knew,
computer-savvy Taiwan and Hong Kong were continuing to invent new
characters. In the end, the onus is on the computer to support the user.
Yes, the computer should support the user, but... The invention of new characters to
serve multitudes is OK, and
For what it's worth, in this oh-so-important discussion... I have seen this length
mark used with both Katakana and Hiragana (I suppose that puts me in the good company
of 'Leven Digit Boy, only he can prove it and I can't). Call the usage nonce or
whatever... So what? It would be fair to
The Venerable Dr Whistler wrote:
I'm sure there is, but I can't lay hands on it right at the moment.
It's sitting in a box in the basement somewhere.
Uh... He probably meant to write:
"Yes, it's right here ahem as you can see from Diagram 7,
it's part of the thin banded layer right above
Everson opined:
But I suspect he didn't write it.
It looks very much like the kind of thing an enthusiastic
second-year university student would write as a term paper.
If Alain wrote that diatribe, he should have said so to avoid any such
questions. Otherwise, it should not have been
Elaine... Quick reply, sorry. I should be more verbose, but I hope
others can chime in.
Is Unicode's so-called "bidi algorithm" really bidirectional, that is,
does it govern horizontal text layout in right-to-left and left-to-
right languages?
Yes.
Or is "bidi" a metaphor here, for
Elain wrote:
Chinese and Japanese newspapers are still mostly written in a vertical,
frequently right-to-left, boustrophedon.
No, not exactly. They don't go "as the ox plows", and it is entirely
improper to utilize the term "boustrophedon" to refer to them. They are
written in columns,
The question that I keep asking is who wrote this missive, and if Alain
didn't write it, where did he get it? That's the most basic question I
had.
Rick
Let me throw my light weight in with John O'Conner...
It's silly to even consider Klingon for Unicode or 10646. Many members of
both committees know this, and that's why it hasn't moved anywhere in
several years. The question keeps cropping up because that silly proposal
is still "on the
thejokrishna wrote:
Hi all,
Can you please point out web applications which entirely support
UTF-8? i.e an application which takes Unicode characters as input,
stores them in a database and retrieves them.
Please see the Unicode web pages. There is a lot of information about
applications
"G. Adam Stanislav" [EMAIL PROTECTED] wrote...
I believe there are other *human* scripts that need to be encoded
in Unicode before Klingon (is Mayan encoded yet, for example?).
I sympathise with the general sense here, but Mayan isn't a great example.
Mayan is dead, as are many other
It has always been my impression that the dz and other digraphs were
included ONLY because they existed in standards that were used as source
material by the Unicode designers. Such digraphs would not have been
encoded otherwise.
Rick
Adam mentions the Latin digraphs encoded for
P. Andries asked:
1) Where is the Gregorian punctum (square dot) ? Is it unified with another
dot, another shaped note (U+1D147) ? If so, why ?
I am double-checking, but I believe it's unified. I'll have more info later.
2) How would a triplet (a group of three notes to be performed in
Lukas P said:
I'd be interested to learn the rationale behind these choices. Is the
original proposal available anywhere?
Try:
http://viva.lib.virginia.edu/dmmc/Music/UnicodeMusic/
That's Perry Roland's original proposal, with a lot of examples. I'm not
sure you'll get much
Why are the punctum and semi-brevis unified with U+1D147 and U+1D1BA
since, unless I err, they do not share the same value but only a
visual similarity
Well... the rationale for that would be the same thing that unifies the
"." in "3.14" and "Mr. Fung".
However, in this case, it's true,
Markus complained:
Thai is not stored/used in logical order in Unicode.
Here's my contribution to the FAQ about Thai:
Q. Why isn't Thai stored/used in logical order in Unicode?
A. Once upon a time, the Unicode fore-parents inherited the Thai
industrial standard, which is an 8-bit standard
Mike wrote:
In particular step 5 should be made required instead optional.
Eh? And deprive the committees of the pleasure of endlessly debating the
one true shape of the unspecified glyph???
Rick
Roozbeh asked...
Would you please give me the reference? I once heard this, but after
seeing a new proposal for "Arabic Tail Fragment" approved by UTC to be
encoded in "Arabic Presentation Forms-B" block (SC2/WG2 document N2322), I
thought I was wrong.
That proposal and this follow-on
Roozebeh wrote...
Oh, I just found it! It's also encoded as a character in the national
standard ISIRI 2900, dated 1989 (which is a 7-bit character set standard).
I will update the proposal. So you can be sure that you have not disobeyed
the rules ;)
Oh good! Nice bit of research...! This
users who have the most interest vested in
the encoding are the scholars themselves (and they are saying the state of
the art prevents a useable encoding at the time)
I don't think it's all scholars who have objected to the Egyptian
proposal. But this is a case where there appears to be no
Doug Ewell quoted:
By convention, the Private Use Area is divided into a Corporate Use subarea,
starting at U+F8FF and extending downward in values, and an End User subarea,
starting at U+E000 and extending upward.
Then Michael Everson wrote:
This has nothing to do with ISO/IEC 10646.
Who
1. The first one is an Arabic Subscript Alef,
I thought we had that one... but I don't find it among the Koranic annotations.
3. The most weird of all, was that after finding all the dingbats and
weird shapes, one was missing: a White Square Containing White Small
Square (compare with
There has been a lot of recent discussion about various uses of the PUA.
Can someone point to widespread instances of confusion and chaos right now
over PUA usage? I don't think there is any.
It seems to me there's a lot of effort being expended to engineer the
regulation of something that
William Overington wrote...
So, when Ken states the sentence above, is that Ken writing as a private
individual ... or Ken writing as a Technical Director
...
... there exists scope for considerable confusion as to the
provenance of a statement made on this list where members of the unicode
Peter said:
2. How do I get software X to know how to process my PUA characters, or how
do I document my characters for others to understand my data?
Michael replied...
In principle it would work, if the OSes are being written to handle user
editing of such things. Ten euros sez they ain't.
Marco Cimarosti wrote:
East Asian Width is a property that tells whether or not each Unicode
character should have the same typographical width as a CJK ideograph. The
property may be yes, no, or a few different kinds of maybe.
Whoa, wait... Whether or not you care at all about the East
$B}*$8$e$&$$$C$A$c$s}*(B wrote:
$B
Some people said things like...
There was another abomination proposed.
I was choosing not to mention the abominable.
The abominable steam-rollers of history squish those who don't scream and
run; and the few weak survivors are forever cleaning up the resulting
messes.
If you think
So I suggest to correct the problem before it came out.
And I would like to propose UTF-32s.
I think this has been anticipated, I think by some people who proposed UTF-8S.
My opinion, for what it's worth, is that there should be no new formats.
We have too many of them already, and making
The main difference from SCSU is that this method preserves binary order.
Ah. And which binary order does it preserve?
The right one, or the other one? ;-)
Rick
Toby, I think you forgot to comment on these objections that have also
been coming up from time to time:
* Introduction of UTF-8S would merely add to the myriad forms people would
already have to support, and it is insufficiently distinuguishable from
UTF-8.
* encoding ambiguities in the
Hi Bev --
Does anyone know if the Lushootseed language is included in Unicode?
I searched but could not find it if you have an URL could you
please send. Thanks in advance.
AS others will no doubt tell you, this standard encodes _characters_ used
for writing, it doesn't encode
Michael Kaplan [EMAIL PROTECTED] wrote:
... asking for a lavicious license to be lecherously lazy
Parse error at lavicious. No such word appears in any English
dictionary I own, not even the OED.
Rick
I only have one question. What do blueberries have to do with XML?
Rick
Gaute B Strokkenes wrote...
[I'm cc:-ing the unicode list to make sure that I've gotten my
terminology right, and to solicit comments
Interesting... I just started looking at Python the other day, once I
discovered it has such nice built-in Unicode support.
If Python is explicitly storing
Martin v. Loewis [EMAIL PROTECTED] wrote:
It seems to be unclear to many, including myself, what exactly was
clarified with Unicode 3.1. Where exactly does it say that processing
a six-byte two-surrogates sequence as a single character is
non-conforming?
It's not non-conforming, it's
I don't think there's any point in encoding 64 hexagrams; especially when
we have the pieces already. Use the pieces of three and position them with
a drawing program. We don't have combining thingies for putting chess
pieces on board squares, either.
Rick
Thomas Chan wrote...
I'd like to ask about the encoding status of the Japanese Jindai
scripts, which are mentioned in older documents[1], and until a certain
point in time, versions of the Roadmap.
Do you have a paper on the topic? You say over a dozen 'Jindai'
scripts. What does this
Thank you Kay Genenz. This web page is helpful. I was not aware of any
of this info. I'm not surprised they disappeared from the roadmap.
Should one consider the Chinese oracle bone
inscriptions (1200 BC) for entry to the unicode list?
They really did exist.
They are unified with the
Unfortunately, you don't hear much about SCSU, and in particular the Unicode
Consortium doesn't really seem to promote it much (although they may be
trying to avoid the too many UTF's syndrome).
Probably that's one point. But also, SCSU is something that's a little more
complicated to
Doug Ewell wrote...
@š‚¶‚イ‚¢‚Á‚¿‚á‚ñš
@Ž„‚͂낱‚¦‚ñ‚ç‚©‚ׂ³B
Robert, please stop this. It doesn't seem to be UTF-8 (that is, I can't copy
and paste it into UniPad or Windows 2000 Notepad and see anything
reasonable)
Eeek.. What's that? 11's comment shows up fine in my mail
Watashi wa loco en la cabeza
Duh, well, use katakana as appropriate, use middle-dots between your foreign
words, and people might get it.
Rick
Thanks to a few people who gave me the answer. I keep forgetting that there
are so many multiple romanizations; I didn't try the other romanization,
but was trying to type dzu (voiced tsu), and just about everything else.
Thanks.
By the way, in case anyone is curious... Why does anyone
Carl Brown suggested:
If you convert to iso-8859-1 you lose characters that is just as bad as
sending Windows-1252 out as iso-8859-1.
Well... If the author converts to ISO 8859-1 on the way out, the author
might lose characters. If you send 1252 labelled as 8859 to the world,
everyone
Caveat: This will only be of interest to Japanese speakers, so you can hit
delete now if you're not interested.
TOYOSHIMA Masayuki pointed at:
http://www.asahi-net.or.jp/~lf4a-okjm/genkan61.htm
Oh! How interesting! Exactly what I needed.
The link to:
Was such encoding done due to some historical reasons in the past?
Yes. The rules for future allocation were formulated many years after
thousands of questionable codepoints were already encoded in the early
days. Usually, these things (presentation forms, ligatures, compatibility
P. Andries wrote:
I'm still interested by a definition of in(-)line software
(http://www.unicode.org/unicode/reports/tr27/). I know what inline code
or processing could be but I can't quite understand the relationship
with the inline software mentioned here and processing music text.
The
This table was generated by the Unicode group for use with TrueType
and Unicode
What an unfortunately worded comment. They probably mean the group of
people inside MS who worked out the list of codes used by the RTF
implementors. They certainly don't mean Unicode, Inc.
(Does the
Jungshik Shin [EMAIL PROTECTED] wrote:
I put up a screenshot of glyphs for GooGyeol
characters included in one of fonts mentioned above at
http://jshin.net/~jungshik/i18n/googyeol.png
Looks to me like everything, or nearly everything, in that list is just a
brush-style rendering of a
One of the questions asked most frequently is whether Unicode encodes some
particular language. As most of you know, Unicode doesn't encode
languages, it encodes scripts. But the thing people often most want to
know is whether their language, or some other language, can be represented.
On 07/31/2001 05:58:57 AM Kairat A. Rakhim wrote:
Cherkessian, Crimean Tatar, Kumyk, Nivkh are not yet presented in the
list.
Peter C responded:
It's my understanding that the Nivkh Cyrillic writing system requires a
couple of characters that are not yet in Unicode.
Can someone propose
Jaipal K, asked:
1) How exactly do I use the Unicode standard?
That depends on what you want to do with it. In Java, characters are
Unicode characters anyway, so you do nothing special. Java is a pretty good
choice for an underlying language.
But for basic questions about Unicode
[EMAIL PROTECTED] wrote:
The existence of the byte sucks.
Well, I suggest therefore that you do Civilizaton a favor and incidentally
leave your indelible Mark on History by devoting every waking moment of the
rest of your life to stamping out the accursed byte.
Rick
If ISCII is still being developed does this suggest that Unicode and its ISO
equivalent move too slowly?
ISCII dates back to 1988 with a revision in 1990. It's not still being
developed -- as far as I know, it's a stable standard that is under
routine maintenance.
I wonder if anyone has
For some reason, the following note from Mark Davis appears to have been
lost in space.
Rick
--
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Date: 09-19-2001 15:51
From: Mark Davis/Cupertino/IBM@IBMUS
Subject: DerivedAge.txt
At the request of someone working with ICU, I
Here we go again... Before everyone goes off and starts blaming Unicode
for bad rendering...
When you render a combining character sequence and it doesn't look right
that is not the fault of the Unicode Standard, it is the fault of your
font and/or rendering software (and the people who
Some brief and not complete answers follow.
I'm trying to get a grasp on exactly how many planes
are defined in Unicode
[...]
How many planes are defined in Unicode 3.1?
There are 17 planes, and everything will be re-written to reflect that,
eventually. Most of the planes are empty
Anyone knows where I could find an online chart of the International
Phonetic Alphabet encoded in Unicode (plain text or HTML)?
Thanks in advance.
_ Marco
Try the charts!
http://www.unicode.org/charts/
Rick
James Naughton wrote...
The most authoritative-sounding page on the web which I could find when I
was investigating this was an article on diacritics by J. C. Wells,
University College, London:
http://www.phon.ucl.ac.uk/home/wells/dia/diacritics-revised.htm
He writes:
The term 'caron',
The correct site for the Shusha font for Devnagari is
http://www.bharatbhasha.com/
And by the way, that site wants Visual Basic Scripting support, so you
can't view it in Netscape at all...
Rick
Nick, et al -- You mentioned:
> In Classical scholarship (and I suspect, beyond it), all
> four possible corner brackets are routinely used as punctuation
> to delimit text in some way ---
I saw your examples of these the other day in Greek text. The upper corners also occur widely. For
Doug Ewell wrote...
Cyrillic was created as a better way to write Slavic languages, Russian in
particular. Shavian and Deseret were created as better ways to write
English. The former met with overwhelming success, the latter did not
It's usual to bind former and latter to the closest
There is code for doing UTF8/16/32 conversions:
ftp://www.unicode.org/Public/PROGRAMS/CVTUTF
Rick
Please see:
http://www.unicode.org/unicode/consortium/distlist.html
All the details.
Rick
Isn't accretion disk something that forms around a black hole?
Tex, et al --
... have added a Tengwar entry [...] to the Plane 1 Demo page.
Woah! Hang on there!
I would like to voice a shout of vehement discouragement about this sort
of thing. Tex wrote it's not officially in Unicode yet -- which still
means it isn't in Unicode.
Making an entry in
At least two of the links from
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE-old/
showed empty pages (or nothing) in my browser a while ago.
Oops. The entire directory APPLE-old is old and obsolete. It was just
moved aside when the new files came in. It has now been removed.
Robert Palais wrote:
Nelson Beebe recommended it since he figured unicode 3.2 would be
the make or break for getting it in use.
Speaking not officially, but as someone who has been lurking around here
awhile, the Unicode Technical Committee does not generally float trial
balloons. In
For those that have not heard about the Unicode standard, you may want to
download the pdf file that describes it at
http://www.unicode.org/charts/PDF/U1D100.pdf
That isn't the first place I would say describes the encoding. That is
just the final code chart and name list. (Which is, of
R. Palais wrote...
Which seems to make Unicode a defender of the status quo. Inaction is
as political as action. We are holders of the standards
for the technology for encoding symbols, and we won't admit new symbols
until they are widely used... not necessarily the intent, but possibly
the
David Starner wrote:
If the symbols in Unicode make a political
statement by being there, then Unicode supports Christianity (U+2626 and
others), anti-Christianity (U+FB29), Islam (U+262a), Hippies (U+262e),
Communism (U+262d), and Dharma (U+2638).
Ahem... Not to mention Turtles. ;-)
Doug Ewell reported:
Many of the embedded images in the Standardized Variants
document are missing.
The missing images have been fixed.
Rick
David Hopwood wrote:
We can *guess* what the column two glyphs look like from the descriptions,
I suppose, but isn't it kind of important to have images of them?
Heh... Well, yeah, theoretically. We just don't have any glyphs for some
of the things in column 2. The items in column 1 will
I would just encode the 20 numerals. However, nobody has yet come up with
a comprehensive proposal, so I would defer any discussion to the point at
which some expert(s) have an opinion about the script in general.
Rick
Michael Everson wrote:
Any candidate for encoding has to meet certain criteria. Like Klingon
didn't. One of those criteria would be doable. Another would be
meets user requirements. A priori rejection of things makes me
nervous, though.
Yeah. I agree that a priori rejection of Labanotation,
Ken let the cat out of the bag:
Unicode 5.0 will be published on December 22, 2007...
complete with a remastered Unicode hymn...
It's true. We've already booked an Abbey Road studio for five days in
March 2007, and we've signed 75 of the hottest young voices in the world to
be in the
Unicode now has a serious competitor.
Kllhk!! Kllhk!! Kllhk! Whoa! Almost choked on my tofu burger!
Oh dewd, you have it so, like, all wrong... Universal character encoding
isn't about Competition and Marketing, it's about everybody doin' it in the
road, all together like, in love, peace,
Corporations have placed his face on the product labels of hazardous
materials, and publishing companies have used the symbol in textbooks
and standardized tests to represent poisons.
Don't get all excited yet, Michael. It says in textbooks not as a
character in running text.
Rick
1 - 100 of 374 matches
Mail list logo