On Thu, Jul 20, 2000 at 09:42:20PM -0800, Doug Ewell wrote:
An SCSU compressor may choose to encode all instances of U+FEFF, not
just the BOM, in the form 0E FE FF. Or it may use another of the
approaches mentioned in the TR. Mine happens to use an SD3 tag (1B A5
FF) for non-initial U+FEFF
On Thu, 20 Jul 2000, David Starner wrote:
So an initial 1B A5 FF is or is not a BOM?
Is not. From UTR#6 http://www.unicode.org/unicode/reports/tr6/#Signature
in reference to the byte sequence 0E FE FF:
| Any other encoding of an initial U+FEFF character, and any encoding of a
| U+FEFF after
Unicode is the code, which is based on 16 bit chunks of ether or whatever,
and UTF-8 is a biased transformation format designed to save American and
Western Europeans storage space and to give some people a warm feeling by
keeping Unicode in the familiar 8 bit world.
Jony
-Original
Patrick Andries wrote:
De : [EMAIL PROTECTED]
On page 876, the character U+6B8B is listed as being
127 strokes beyond the radical. I'd say it's more
like 6 strokes beyond the radical.
I believe it to be 5 strokes and it is already listed under
radical + 5
strokes.
Funny: it is +6
In the meantime, Microsoft is still pretty firmly rooted in the idea that
Unicode=USC-2 (or UTF-16le on Windows 2000).
I don't think we can make a blanket statement about MS being firmly rooted
in USC-2. They're very big and manage lots of code that a lot of people use
on a regular basis, and
On 07/21/2000 04:42:05 AM [EMAIL PROTECTED] wrote:
Unicode is the code, which is based on 16 bit chunks of ether or whatever,
and
UTF-8 is a biased transformation format...
That's too simple to capture the current reality, as others have been
indicating. The full story is availble in UTR17,
Asmus Freytag wrote:
At 09:53 AM 7/20/00 -0800, Ken Krugler wrote:
2. Is little-endian UCS-2 a valid encoding that I just don't
know about?
Yes, it is. Your example of the VFAT system is a near perfect
case, since
the details of it form what Unicode calls a 'Higher level
protocol' and
1) Unicode code units are not 8 bits long; deal with it.
Joe
How about "1) Unicode characters don't fit in 8 bits; deal with it."
"Code units" isn't really in the spirit of JOVUC.
--
John Cowan [EMAIL PROTECTED]
C'est la` pourtant que se livre le sens du
1) The UTF whose bits can be counted is not the eternal UTF.
The encoding that is not in UTR-17 is not a compliant encoding.
UCS-2 is the origin of the BMP.
UTF-16 is the origin of 1,048,576 more code points.
Therefore, constantly use UTF-8 and you'll see the mystery on your mail
David Starner [EMAIL PROTECTED] wrote:
So an initial 1B A5 FF is or is not a BOM?
That is correct, it is or is not. :-)
Unfortunately, despite the recommendation in the TR, you have no
guarantee that an initial U+FEFF intended as BOM will be encoded 0E FE
FF while an initial U+FEFF intended
Unicode has changed and evolved over the years. At this point, UCS-2 is a funny
beast, because it shares precisely the same encoding space as UTF-16. That is,
in code units there is absolutely no difference between them. The only real
difference is whether you interpret the code units in the
Because of its usage, ZWNBSP is extremely unlikely at the start of a file,
but that doesn't mean it can't occur. A question mark is also extremely
unlikely, as are many other characters. However, they can occur. Unicode
doesn't forbid any sequence of characters from occurring. Stripping, say,
Jonathan Rosenne wrote:
2) Byte order is only an issue in I/O.
I accept this change.
--
Schlingt dreifach einen Kreis um dies! || John Cowan [EMAIL PROTECTED]
Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
Denn er genoss vom Honig-Tau, ||
As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and
UTF-16LE. The
first is with (optionally) a BOM, and the others without.
I know this is what the Standard dictates, and I think I understand why,
but it doesn't make complete sense to the novice trying to find his/her
way:
novice
I do not suppose that characters of 128+ strokes are indeed
possible, due to the fact that the paper would get quite soggy
from the repeated strokes.
Well, if they get soggy on little paper just write 'em on bigger paper!
In any case, your supposition is not adequately informed. For
Would anyone like to please translate that into Chinese for the benefit of future
generations?
Rick
1) The UTF whose bits can be counted is not the eternal UTF.
[EMAIL PROTECTED] wrote:
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders?
The simplest way to think about it is to remember that a MIME charset is meant
to provide *minimal* information for the receiver to convert bytes into
characters. If
Internationalization (i18N) is the process of bringing your application to
the world... having it work properly and appropriately on any locale.
Localization (L10N) is the process of bringing the world to your
application localizing is translating the application (not just the
language but
i18N is a lot more than that... and often it can stand on its own. It should
ideally be done before L10N is (and there are examples of bad results when
this was not the case). But I hate it to make it sound like one is just a
mere stepping stone to the other.
Just my 2 cents.
michka
-
Would anyone like to please translate that into Chinese for the
benefit of future generations?
Rick
1) The UTF whose bits can be counted is not the eternal UTF.
How about ...
UTF ke3 shu3 fei1 chang2 UTF.
But this leaves the "bits" out which anyway appears to me to be a
commentator's
At 03:42 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
Patrick Andries wrote:
De : [EMAIL PROTECTED]
On page 876, the character U+6B8B is listed as being
127 strokes beyond the radical. I'd say it's more
like 6 strokes beyond the radical.
I believe it to be 5 strokes and it is already
At 04:58 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
If UCS-2LE is a *standard* encoding (and it is in fact mentioned in UTR-17),
how does VFAT directories qualify as a "higher level protocol"?
My understanding of "higher level protocol" is that it is a *non* standard
usage of some kind, allowed
At 07:14 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it
just one of the other two? When it does have a BOM, it can still be
serialised in two ways, so
On 07/21/2000 09:30:59 AM [EMAIL PROTECTED] wrote:
What is the difference between internationalization and localization?
i18n is the process of ensuring that software can be localized. L10n is
the
process of actually doing so. Software may be internationalized while
supporting only
Jony Rosenne, who has been a great contributor since or before the
beginning, wrote in an off moment:
UTF-8 is a biased transformation format designed to save American and
Western Europeans storage space and to give some people a warm feeling by
keeping Unicode in the familiar 8 bit world.
On 07/21/2000 12:55:59 PM [EMAIL PROTECTED] wrote:
The problem is that the labels where invented to tag data streams, not to
'label' the result of autodetection. As you point out there are 4 results
of
auto-detection:
UTF-16, no BOM
UTF-16, no BOM, but arriving in reverse byte order (for my
With respect to the capitalization rules, I do not recall ever
seeing "i18N" with lowercase i and uppercase N.
The only place where I think it might occur is in one of those
ransom notes where the case varies throughout the message and
the characters are all cut from magazine ads.
"i hAvE yOuR
Would anyone like to please translate that into Chinese for the
benefit of future generations?
Rick
1) The UTF whose bits can be counted is not the eternal UTF.
Jon Babcock suggested:
UTF ke3 shu3 fei1 chang2 UTF.
But this leaves the "bits" out which anyway appears to me
From: [EMAIL PROTECTED]
Of John's and Michael's explanations, I have to say this is the better
one.
(Sorry, Michael.)
I won't take it personally. :-)
I simply think that the term Internalization covers a related but entirely
separate function that does not always lead to localization a
- Message d'origine -
De : "Asmus Freytag" [EMAIL PROTECTED]
À : "Unicode List" [EMAIL PROTECTED]
Cc : [EMAIL PROTECTED]
Envoyé : Friday, July 21, 2000 12:10 PM
Objet : RE: 127 strokes beyond the radical?!
Patrick Andries wrote:
De : [EMAIL PROTECTED]
On page 876, the character
Patrick asked:
Patrick Andries wrote:
De : [EMAIL PROTECTED]
On page 876, the character U+6B8B is listed as being
127 strokes beyond the radical. I'd say it's more
like 6 strokes beyond the radical.
I believe it to be 5 strokes and it is already listed under
radical + 5
At 4:31 AM -0800 7/17/2000, [EMAIL PROTECTED] wrote:
From: MICHAEL W. MARTIN
Actually, we're off making wild assumptions about the nature of
Michael's
problems with no data to work with...
Sorry about that... it was not my intention to keep you in the dark. =)
The project I'm working on
32 matches
Mail list logo