At 10:35 AM 6/14/00 -0800, you wrote:
At 09:57 AM 06/13/2000 -0800, Otto Stolz wrote:
Off-topic
Am 2000-06-13 um 17:49 h hat Alain geschrieben:
[Having pictograms everywhere] is much lighter than having to provide
indications, say, in 12 languages (most common example: toilets).
Watch out
At 05:29 AM 6/23/00 -0800, [EMAIL PROTECTED] wrote:
Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not
U+FFFE)
as a zero-width non-breaking space (despite its formal name).
And U+FFEF should *only* be used as a byte order mark and/or signature.
(That
is already ambiguous
At 06:31 AM 6/29/00 -0800, you wrote:
Thanks to all for your comments. Has anyone actually used these tags
yet?
Maybe we should postpone these tags for a while until we get a louder
answer to your question, Doug. Once coded, here forever.
A./
At 09:16 AM 7/2/00 -0800, Doug Ewell wrote:
The problem with the phrase "plain text ceases to be plain if you decide
that layout information needs to be encoded" is the word "layout." In
the broadest sense, line and paragraph separation could be considered
"layout," and nobody would suggest
Doug's point is well taken. It's been the editorial committee's policy to
make sure that TR's can be accessed from a wide variety of browsers. If
limiting the range of formatting that is to be used in TR's makes a real
difference to people in the implementers community, then that is something
At 12:18 PM 7/11/00 -0800, [EMAIL PROTECTED] wrote:
What about F? I was told that there are 0x10
possible characters?
Oh, by the way, if 12 is a dozen and 144 is a gross,
what are 16 and 256?
There are 0x10 - 34 possible characters!
All code values ending in 0xFFFE and Ox do
At 01:25 PM 7/11/00 -0800, Leon Spencer wrote:
Has ISO addressed the Euro character?
Yes. It's at 0x20AC in ISO/IEC 10646-1.
There has been an attempt to create a series of 'touched up' 8859
standards. The problem with these is that you get all the issues of
character set confusion that
At 12:56 PM 7/11/00 +, [EMAIL PROTECTED] wrote:
If you bought a copy of the book, you would have known.
I saw 2.0 in the Barnes Noble book store the other evening,
but they only had one left and it was a struggle to get to it through
the competing crowd... Of course, they were competing
At 07:50 AM 7/13/00 -0800, Antoine Leca wrote:
Alex Bochannek wrote:
A similar issue was very interesting to observe in France and
Germany. The use of the English language in advertisement seems to run
rampant in Germany while almost all ads that include English in France
(mostly tag
There's no updating needed. The key is that The Unicode Standard, Version
3.0 recognizes UTF-16 as the default encoding. Therefore code values (or
units) which are defined as 'minimal bit combination that can represent a
unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of
At 08:17 AM 7/20/00 -0800, John O'Conner wrote:
2. Compiling your app as a UNICODE application means that all Win32 API calls
use Unicode-enabled versions of the API. Text areas expect you to pass
Unicode, and it displays correctly when an appropriate font is used.
Even if you don't compile an
At 09:53 AM 7/20/00 -0800, Ken Krugler wrote:
2. Is little-endian UCS-2 a valid encoding that I just don't know about?
Yes, it is. Your example of the VFAT system is a near perfect case, since
the details of it form what Unicode calls a 'Higher level protocol' and
those may legitimately override
At 11:34 AM 7/20/00 -0800, John Cowan wrote:
1. Could it be using UTF-16LE? I tried creating an entry with a
surrogate pair, but the name was displayed with two black boxes on a
Windows 2000-based computer, so I assumed that surrogates were not
supported.
Probably not. So technically it
At 11:41 AM 7/20/00 -0800, Ken Krugler wrote:
No. UCS-2 and UCS-4 have always been bigendian. Read ISO 10646-1:1993,
section "6.3 Octet order" (page 7):
When serialized as octets, a more significant octet shall
precede less significant octets.
The section continues: "When not serialized
At 03:42 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
Patrick Andries wrote:
De : [EMAIL PROTECTED]
On page 876, the character U+6B8B is listed as being
127 strokes beyond the radical. I'd say it's more
like 6 strokes beyond the radical.
I believe it to be 5 strokes and it is already
At 04:58 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
If UCS-2LE is a *standard* encoding (and it is in fact mentioned in UTR-17),
how does VFAT directories qualify as a "higher level protocol"?
My understanding of "higher level protocol" is that it is a *non* standard
usage of some kind, allowed
At 07:14 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it
just one of the other two? When it does have a BOM, it can still be
serialised in two ways, so
At 12:13 PM 7/28/00 -0800, Roozbeh Pournader wrote:
I was not talking about the shape. I think all of us have seen it, and
many have also read the documents which define its exact shape using a
ruler and a compass. I was talking about the origin of the shape.
In some sense, except for purists,
At 07:46 PM 7/30/00 -0800, John Cowan wrote:
Yeah, how WOULD you make a serifed, rounded E that
doesn't look silly and doesn't look like a C with
an extra line? Well, maybe you can, I dunno. Anyone
who can do that, I'd like to see it.
At 11:01 PM 8/7/00 -0800, Jianping Yang wrote:
Not really for Unicode in which we have relocated some codepoints for Hangul
between Unicode 1.1 and 2.0 :)
Regards,
Jianping.
"Christopher J. Fynn" wrote:
Allowing changes like this would break
existing implementations of these standards -
At 09:36 AM 8/10/00 -0800, Roozbeh Pournader wrote:
That seems problematic to me, when used for Arabic. How should one use
ZWNJ between two Arabic letters to stop the ligature? The'll get
disconnected!
(in those rare cases...)
Use ZWJ ZWNJ ZWJ and you will get the intended effect.
A./
This discussion has become quite "surreal".
In the meantime, I and other people who have the need to write about these
characters have, with more or less encouragement from the Unicode Editorial
Committee started to use the terms "Supplementary Planes", "Supplementary
Characters" etc. This
Preliminary character charts are now available for those characters that
are proposed to go into Unicode 3.2 (and into AMD1 to ISO/IEC
10646-1:2000). The majority of the proposed characters are mathematical
symbols and arrows.
The new URL is:
http://www.unicode.org/charts/draftunicode32/
At 07:44 AM 10/20/00 -0800, you wrote:
Asmus,
Do you have a list of the Unicode 3.1 codes?
Carl
They will appear in due course on ...draftunicode31
You guys are all so eager!
A./
PS: I've made some font fixes for the draftunicode32 charts - however they
don't affect any of the new
At 09:53 PM 12/9/00 -0800, Asmus Freytag wrote:
Hello, UniCoders!
Whatever happened to UniCode Technical Report *#12*what's it about?! Is
TR12 closer to adoptation by UniCode?
Unicode Technical Report 12 was superseded by additions to Unicode 3.0
before it was even advanced to final TR stage
At 12:50 PM 12/11/00 -0800, James Kass wrote:
Michael (michka) Kaplan wrote:
The Windows NT4 charmap does a fair job of this, and the Windows 2000 one
does a better job.
For a presentation that follows the Unicode Standard, try Unibook for NT or
Win95 on http://www.unicode.org/unibook
At 03:38 AM 3/22/01 +, Christopher John Fynn wrote:
But you can also filter mails based on the To: header
"To: [EMAIL PROTECTED]" - every mail client I've seen that supports
filtering lets you filter based on that header.
Except if the message is a cc:...
Actually of more interest to me
At 09:24 AM 4/16/01 +0900, Martin Duerst wrote:
NFC only eliminates things that are supposed to look exactly
the same. NFKC eliminates quite a bit more than that.
NFKC eliminates some things that are quite distinct - it should
not be seen as a general purpose folding mechanism.
A./
Date: Thu, 19 Apr 2001 12:59:43 -0700
To: Tomas McGuinness [EMAIL PROTECTED]
From: Asmus Freytag [EMAIL PROTECTED]
Subject: Re: Byte Order Marks
At 02:58 PM 4/19/01 +0200, you wrote:
If its absent is it safe to assume any particular order (i.e. Big or
Little Endian?)
The default order is Big
At 10:20 AM 4/20/01 -0400, Dean A. Snyder wrote:
... the Unicode
Consortium should only entertain proposals to the standard after ACTIVELY
seeking the input from the relevant (scholarly) communities - something
which the ICE and UFU projects are doing for two cuneiform script systems.
And, if it
At 03:50 PM 4/20/01 -0500, [EMAIL PROTECTED] wrote:
I say 0 and 1 are adequate. I find this discussion rather pointless
since we all already know that ASCII is adequate if the given premise
is that ASCII is adequate. I don't see what's there to discuss.
We are just trying to see if tautologies
Hear, hear,
At 05:43 PM 4/23/01 -0400, Sarasvati wrote:
Dear Subscribers --
This mail list is a public free-for-all with uncontrolled distribution.
As a corollary, the act of publishing material on this list is tantamount
to unrestricted publication. If you mail out something that should be
Why Unicode will never endorse certain proposals
By making the Private Use Area private, the Unicode Consortium imposed on
itself a restriction to stay absolutely neutral on the use of these
characters. In other words, it cannot promote or
William Overington wrote:
However, there is something that I feel that the Unicode
Consortium could do, if it so wished, without violating
that rule. I suggest that the Consortium could,
if it so chooses, encode one or more regular unicode
characters together with a protocol so that
specifically for use in the kind of
protocol that you describe, it would have shown a preference over other
users of the PUA who either don't use any protocol or use a set of PUA
characters for the same purpose using a different protocol not recognized
by the Consortium. In other words:
Asmus
At 09:54 AM 5/7/01 -0700, Rick McGowan wrote:
Now, Word2000 or some other product, or some specific set of fonts may not
be what a classicist wants, but that limitation is not because the width
of many characters are somehow CONSTRAINED by the East Asian Width
property.
While that is true, any
For example the MS Mincho font supports these characters as serifed numbers,
the Arial Unicode MS font supports these as sans-serif.
I believe it should be possible to use these fonts with Word 97.
There are many ways to get these fonts, I usually get them by installing
the Far East support for
of the Limbu Script
L2 001-138 Summary proposal form (Limbu)
L2 001-139 Printed samples of Limbu
Please follow up with Rick or Ken if you have issues with any of the contents.
A./
Asmus Freytag
Unicode Liaison to WG2
At 12:02 AM 5/29/01 -0700, James Williams wrote:
Can someone please help me understand whether support for double byte is the
same as being Unicode compliant. Any elaboration would be greatly
appreciated. If for instance, being Unicode compliant has any additional
value/benefits, etc... I'd like
At 01:57 PM 7/23/01 +0900, Martin Duerst wrote:
The language here is slightly different, and I have no idea whether
the intent was exactly the same, but in any case it seems that the
intents were very close to each other.
IA characters were from the beginning intended for in-process use, in
At 11:50 AM 9/7/01 -0500, Ayers, Mike wrote:
Words with the
same spelling and different pronunciation are uncommon but exist in English,
the classic example being read and its own past tense.
Actually, this is a bit more common than you think, since the pronunciation
of vowels in English
At 01:06 PM 9/7/01 -0400, David Gallardo wrote:
As a practical matter, you need to take the diacritics into account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic behaviour. In other
words, résumé and resume should
At 09:04 PM 9/7/01 -0700, Mark Davis wrote:
I disagree. What you want is a merged database field. See
http://www.macchiato.com/slides/icu_collation.ppt
Mark
Mark,
David took the remainder of our discussion off the alias. I won't repeat it
here, just to note that we've agreed that merged
At 02:45 PM 9/8/01 -0700, Mark Davis wrote:
If you use a Danish tailoring of the UCA that equates Å and AA (at least at
a primary and secondary level), then they will sort the same way. A string
search that uses the same tailoring will also find Ålborg when given
Aalborg (and vice versa).
But if
Your letter makes clear that Unicode needs to do a better job of
identifying the preferred character code for many situations. The
information is there to a large extent, but buried in the fine print or in
data tables.
You will see that there is a canonical decomposition from U+212B to
At 11:42 AM 9/13/01 +, Marcin 'Qrczak' Kowalczyk wrote:
IMHO Unicode would have been a better standard if UTF-16
hadn't existed.
Decidedly not. In fact, Unicode would not be widely implemented today.
Just UTF-8 and UTF-32, code points in the range
U+..7FFF, no surrogates, no
At 12:26 PM 9/18/01 -0700, Kenneth Whistler wrote:
3.Why don't noBreak formatted Unicode characters
have a canonical decomposition (the compatibility
decomposition surrounded by glue)?
A long story. But the short answer is that such a decomposition
would cause problems for
At 10:21 AM 9/21/01 -0700, Kenneth Whistler wrote:
It is my impression, however, that most significant applications
tend, these days, to be I/O bound and/or network
transport bound, rather than compute bound.
...
We don't hear
much, anymore, about how wasteful Unicode is in its storage
of
There are 66 non-characters as of Unicode 3.1, there were 34 non-characters
before.
There are no hidden non-characters, but there were 'hidden' planes in
Unicode 3.0
- hidden in the limited sense that they were defined as character and
non-character
locations, but no characters were assigned,
At 10:42 PM 10/1/01 -0700, Bernard Miller wrote:
--- Asmus Freytag [EMAIL PROTECTED] wrote:
There are 66 non-characters as of Unicode 3.1, there
were 34 non-characters
before.
I understand now.. the non characters in 16 higher
planes were defined first, then the ones in the arabic
At 03:12 PM 10/24/01 -0400, tom emerson wrote:
Asmus Freytag writes:
FWIW, Robert Bringhurst's The Elements of Typographic Style, 2nd
Edition has in part this to say about caron:
Do you have the date of this book?
1996. It's a fabulous book, 'caron' aside:
I don't doubt
At 06:32 PM 10/24/01 -0500, G. Adam Stanislav wrote:
The first time I encountered the term caron was in the eighties when
studying the design of Adobe PostScript fonts. Not being a native English
speaker, I simply took it for the English word for this diacritic.
This opens up the possibility that
Here's some more info on a possible origin.
A./
Date: Fri, 26 Oct 2001 13:52:38 -0400 (EDT)
From: Barbara Beeton [EMAIL PROTECTED]
well, guys,
i don't think we're going to get anything much better than this.
this recollection predates 1984 by a *long* time.
cheers.
At 05:50 PM 10/31/01 -0800, Kenneth Whistler wrote:
I have no quarrel with the claim that the SCSU scheme could be
implemented directly on UTF-32 data. But as Unicode Technical Standard
#6 is currently written, that is not how to do it conformantly.
Actually, no specific encoding form is
At 11:26 AM 11/7/01 -0800, Eric Muller wrote:
Let's rewind to 1996. I encode a document, and I want a math less-than or
equal character. The picture I want for it has the equal bar slanted.
Looking throughout my Unicode 2.0 standard, I conclude that U+2264,
LESS-THAN OR EQUAL is what I want (with
At 12:37 PM 11/27/01 -0800, James Kass wrote:
Isn't that where it belongs? Default display for isolated combining
marks shows them with the dotted circle.
No it does not. That's an artifact of the Unicode code chart notation.
25CC in many fonts (and in the charts for that matter) looks
At 12:32 PM 11/28/01 +0100, Marco Cimarosti wrote:
I don't think that Unicode requires that a non spacing mark *has* to be
placed on something in order to be displayable. However, some fonts may
chose to represent a stand-alone non spacing mark as floating on some
default glyph, for either
At 01:43 PM 10/9/01 -0400, Gary P. Grosso wrote:
Because of Unicode's Han unification, I was under the impression that
to get both Traditional Chinese and Simplified Chinese to really look
right would require using different fonts for each. To have different
fonts for the same characters in a
At 03:43 PM 10/9/01 -0500, Ayers, Mike wrote:
Oooh - a swing and a miss!
No -- a pretty complete misunderstanding of my posting on your part.
The implication of my statements is that rich text support is required at
least at some level of your architecture as soon as you want to go
We've finally been able to obtain better fonts for the new characters in
CJK-Extension B. The PDF chart is at
http://www.unicode.org/charts/PDF/U2.pdf.
Enjoy.
A./
PS: Fair warning: the complete PDF file is 13MB and contains only the glyph
and code point, no other information about the
W3C's HTML validation service seems to have no such problems.
We've been using it to validate all the files on the unicode
site regularly.
A validator *should* look between the and in order to
catch invalid entity references, esp. invalu NCRs.
For UTF-8, it would ideally also check that no
James,
NCRs *are* markup. And validating that the encoding matches
the declaration (e.g. UTF-8 is not ill-formed) has nothing
whatsoever to do with content, but all with verifying that
the file conforms to the HTML specification.
All this is completely different from spelling and grammar
At 02:01 AM 12/15/01 +, Christian Cooke wrote:
The text annotations to U+000A and U+000D in Unicode 3.0 do not refer to
U+2028 and do not recommend the use of U+2028 as the preferred character
for for text processing in this context.
Does the UTC have a recommendation about using U+2028
At 10:38 AM 12/18/01 -0800, Rick Cameron wrote:
It looks like UCS-2 and UCS-4 are defined in ISO 10646. Does that standard
restrict the valid range of UCS-4 to 0..10?
It will with AMD1 to ISO/IEC 10646-1:2000 which is expected to pass final
balloting and head for publication in 2002.
If
At 03:38 PM 12/18/01 -0800, Rick Cameron wrote:
Are you planning to add an explicit statement to the Unicode standard that
the valid range for scalar values is 0..10? (Or is such a statement
there, and I've just missed it?)
see below:
In particular, as the use of 32-bit variables to hold
On top of that, it looks like 950 maps a bogus symbol or punctuation
character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for
starters. Fonts covering CP950 give a way different image for that
character than you'd expect from either the charts or the names...
I let some
At 10:38 AM 12/19/01 +, Kevin Bracey wrote:
In message [EMAIL PROTECTED]
Asmus Freytag [EMAIL PROTECTED] wrote:
On top of that, it looks like 950 maps a bogus symbol or punctuation
character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for
starters. Fonts
At 12:34 AM 12/28/01 -0600, [EMAIL PROTECTED] wrote:
If you want to define text/math, and provide the disappearing parenthesis
and precedence tables and everything, then that's fine, but I don't see
why it should be part of Unicode, anymore than full music rendering is part
of Unicode. It's a
At 12:07 PM 12/29/01 +0100, Stefan Persson wrote:
Seeing that Unicode already has left-to-right and right-to-left override
characters, I wonder if a top-to-bottom override character might also be
reasonable.
Which are the code points for these characters?
Please see
At 03:41 PM 12/29/01 -0500, David J. Perry wrote:
The ancient Roman monetary unit sestertius is not yet in Unicode. It might
well be accepted if proposed, but would be given one codepoint. However,
this unit appears in a variety of ways in inscriptions: IIS, HS, II with a
horizontal line
At 02:33 PM 12/30/01 -0500, Tex Texin wrote:
It is a bit inconsistent and therefore confusing.
I searched for bidirectional which immediately pointed me at the
general punctuation pages in a pdf file.
Searching for bidrectional in that file turns up empty.
This is one of the few cases of an
At 12:22 PM 12/31/01 -0500, Tex Texin wrote:
I was fooled by that earlier in the year as well. The links to the other
pages should be at the top of the web page to highlight that the page is
a partial list and to make it easy to reference the other pages. Most
people will not scroll to the bottom
version 2.3+CL 01/14/2001 with nmh-1.0.4
To: Barbara Beeton [EMAIL PROTECTED], Asmus Freytag [EMAIL PROTECTED],
Murray Sargent III [EMAIL PROTECTED]
cc: [EMAIL PROTECTED] (linux-utf8)
Subject: PDUTR #25: Unicode Support for Mathematics
X-URL: http://www.cl.cam.ac.uk/~mgk25/
Date: Thu, 03 Jan
At 06:26 PM 1/15/02 -0800, Kenneth Whistler wrote:
Hello. I am looking for help with Unicode. I was recently told by my
credit
card processing company that I need to Upgrade my site to unicode 3.2 in
order to get a perl script working.
There has got to be a disconnect here somewhere.
At 10:06 AM 1/18/02 -0700, Robert Palais wrote:
Which seems to make Unicode a defender of the status quo. Inaction is
as political as action. We are holders of the standards
for the technology for encoding symbols, and we won't admit new symbols
until they are widely used... not necessarily the
Just an aside on terminolgy:
At 08:02 PM 1/18/02 +0100, Marco Cimarosti wrote:
3) A newly added operator (ZWL) which allows joining two characters into a
it's CGJ for Combinign Grapheme Joiner
4) A set of operators called Ideographic Description Character (IDC) for
They are for Ideographic
At 11:02 AM 1/18/02 -0800, Barry Caplan wrote:
I've always been under the impression that one of the original goals of
the Unicode effort was to do away with he sort of multi-width encodings we
are all too familiar with (EUC, JIS, SJIS, etc.). this was to be
accomplished by using a fixed width
At 11:36 AM 1/18/02 -0800, Rick McGowan wrote:
It is our job as a standarizing organization to standardize what is IN USE
so that (as a goal) people can standard-ly communicate those symbols
internationally without ambiguity. It is _NOT_ our job, and never will be
our job, to invent new symbols
At 12:48 AM 1/20/02 -0800, James Kass wrote:
The arguments about relative size are true, but in this day and age are
considered unimportant. Graphics files are extremely large in comparison
with text files of any script and so are sound files. Devanagari UTF-8 is
three bytes. The four byte
At 06:29 AM 1/24/02 +, David Hopwood wrote:
Kenneth Whistler wrote:
And StandardizedVariants.html has been updated again, with more
of the missing glyphs provided.
I can't see any difference between plain U+2278 (either in the draft
code chart or StandardizedVariants.html) and U+2278
At 11:31 AM 1/25/02 -0800, Julie Allen wrote:
John Hudson asked,
As Unicode continues to grow, I wonder if we can expect another book--
or
multiple volumes -- at some stage, or if the standard will become a
purely
electronic document? Has any decision been taken about this?
There are
At 10:58 PM 1/24/02 +, David Hopwood wrote:
One possibility is to make VS1 specify what is now the reference glyph,
and VS2 specify the alternate glyph. Unmarked would mean either.
Boy, great minds do think alike. I proposed that in a paper to the UTC
last year. ;-)
You realize that this
At 07:40 PM 1/26/02 -0500, [EMAIL PROTECTED] wrote:
One of the new characters scheduled for Unicode 3.2 is
U+213F DOUBLE-STRUCK CAPITAL PI
(A 500-byte GIF is attached.)
Double-struck pi! What better symbol to represent 2 * pi?
These double struck symbols are used by mathematical sofware
At 12:33 AM 1/27/02 -0800, Mark Davis \(jtcsv\) wrote:
I find it fairly pointless to say that a font supports the variation
selection sequence U+03B8, U+FE00 if it does not provide a visual
distinction from U+03B8; and such a distinction should be based on the
entry description. Thus, of the
At 12:43 PM 1/27/02 -0800, Mark Davis \(jtcsv\) wrote:
It sounds like what you are saying, in concrete terms, is that Font #6
at the bottom of:
http://www.macchiato.com/utc/variation_selection/variation_selection_f
ollowup.htm
is conformant. If that is so, then we would have to have an
Kana (Hiragana/Katakana):
Two (essentially) iso-phonic(?) systems, where each symbol
in one set has a corresponding symbol in the other set,
both denoting the same sound value.
The set of forms are historically unrelated.
There is little overlap in the
At 09:42 AM 1/30/02 +0100, Karl Pentzlin wrote:
The question is, are typesetting rules part of the script?
(I mean rules in the sense of obligatory regulations, not guidelines).
This distinction is a very German way of approaching the question.
If yes, (in my opinion) the plain text must carry
Found this message in my outbox; just sending this out now for completeness.
At 12:47 PM 1/7/02 +0330, Roozbeh Pournader wrote:
Accordingly, the Old Italic
script has a default directionality of strong left-to-right in this
standard. When directional overrides are used to produce
At 10:32 AM 2/5/02 -0800, Magda Danish (Unicode) wrote:
Begin forwarded message:
From: [EMAIL PROTECTED]
Date: 2002-02-05 10:44:20 -0800
To: [EMAIL PROTECTED]
Subject: Using Unicode Characters in ASCII Streams
Hallo,
we are a manufacturer of time and attendance terminals which are
At 11:53 AM 2/7/02 -0600, David Starner wrote:
a superset of a number of preexisting character sets, so that it was
possible for those users to move to Unicode without problems. Since
important preexisting character sets seperated Greek, Cyrillic and Latin
scripts, Unicode had to. Had Unicode not
At 01:21 PM 2/7/02 -0500, Elliotte Rusty Harold wrote:
I'm not sure Unicode can be fixed at this point. The flaws may be too
deeply embedded. The real solution may involve waiting until companies and
people start losing significant amounts of money as a result of the flaws
in Unicode, and
At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote:
Oh, it is very well possible to design a character set that supports
all of Latin, Cyrillic and Greek without being susceptible to this
problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
to encode glyphs instead of characters
At 06:37 PM 2/11/02 +, Juliusz Chroboczek wrote:
We, ASCII-age programmers, are used to considering plain text
rendering as being injective up to binary identity. We carefully
choose fonts that distinguish between O and 0, 1 and l. We use
editors that warn us about non-native line ending
At 09:22 AM 2/14/02 +, Martin Kochanski wrote:
Are there, in fact, many circumstances in which it is necessary for an end
user to create files that do *not* have a BOM at the beginning?
In principle this is a requirement for data being labelled *external to the
date* as being in either
Whether or not they would get support to be encoded is almost irrelevant as
long as no-one comes forward and makes a formal proposal with solid
background information. Only then can this issue be settled where it
matters: in the UTC.
Discussions on open lists like this, unless accompanied by
At 12:37 PM 2/16/02 -0800, Doug Ewell wrote:
Why would anyone, faced with a UTF-8 file that contains invalid
sequences, want to retain the invalid sequences, much less convert the
file to another encoding form that either (a) preserves the invalid
sequences or (b) leaves a marker showing where
Unicode Technical Report #20 at
http://www.unicode.org/unicode/reports/tr20/
has been updated.
This report is jointly published as Unicode
Technical Report and W3C Note. It has been
updated primarily to reflect the addition
of character to Unicode 3.1 and the pending
addition of characters to
At 09:52 PM 2/18/02 -0800, Doug Ewell wrote:
So if some language turns out to need
a with horn in the future, its readers will have to cross its fingers
that rendering engines become capable of displaying U+0061 U+031B
properly.
Support for such arbitrary combination is apparently in the works
Would you by chance mean 'threads' ?
There is a difference, you know ;-)
A./
At 04:49 PM 2/26/02 +0700, Stefan Probst wrote:
Good Evening,
can somebody pls. explain to me dummy, what the long threats about
R(o|u)mania, Canada, California, Yankees, and Initials in various
countries..
At 05:36 PM 2/27/02 -0500, John Cowan wrote:
numbering houses (which seems to be 18th century)
I would have ventured that it is much older than that, dimly recalling some
older maps from a small museum I once visited.
There's a difference between house numbers and street addresses. House
1 - 100 of 1250 matches
Mail list logo