Re: Signature for SCSU

2000-07-21 Thread David Starner
On Thu, Jul 20, 2000 at 09:42:20PM -0800, Doug Ewell wrote: An SCSU compressor may choose to encode all instances of U+FEFF, not just the BOM, in the form 0E FE FF. Or it may use another of the approaches mentioned in the TR. Mine happens to use an SD3 tag (1B A5 FF) for non-initial U+FEFF

Re: Signature for SCSU

2000-07-21 Thread Daniel Biddle
On Thu, 20 Jul 2000, David Starner wrote: So an initial 1B A5 FF is or is not a BOM? Is not. From UTR#6 http://www.unicode.org/unicode/reports/tr6/#Signature in reference to the byte sequence 0E FE FF: | Any other encoding of an initial U+FEFF character, and any encoding of a | U+FEFF after

RE: Unicode in VFAT file system

2000-07-21 Thread Jonathan Rosenne
Unicode is the code, which is based on 16 bit chunks of ether or whatever, and UTF-8 is a biased transformation format designed to save American and Western Europeans storage space and to give some people a warm feeling by keeping Unicode in the familiar 8 bit world. Jony -Original

RE: 127 strokes beyond the radical?!

2000-07-21 Thread Marco . Cimarosti
Patrick Andries wrote: De : [EMAIL PROTECTED] On page 876, the character U+6B8B is listed as being 127 strokes beyond the radical. I'd say it's more like 6 strokes beyond the radical. I believe it to be 5 strokes and it is already listed under radical + 5 strokes. Funny: it is +6

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
In the meantime, Microsoft is still pretty firmly rooted in the idea that Unicode=USC-2 (or UTF-16le on Windows 2000). I don't think we can make a blanket statement about MS being firmly rooted in USC-2. They're very big and manage lots of code that a lot of people use on a regular basis, and

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
On 07/21/2000 04:42:05 AM [EMAIL PROTECTED] wrote: Unicode is the code, which is based on 16 bit chunks of ether or whatever, and UTF-8 is a biased transformation format... That's too simple to capture the current reality, as others have been indicating. The full story is availble in UTR17,

RE: Unicode in VFAT file system

2000-07-21 Thread Marco . Cimarosti
Asmus Freytag wrote: At 09:53 AM 7/20/00 -0800, Ken Krugler wrote: 2. Is little-endian UCS-2 a valid encoding that I just don't know about? Yes, it is. Your example of the VFAT system is a near perfect case, since the details of it form what Unicode calls a 'Higher level protocol' and

Re: Unicode FAQ addendum

2000-07-21 Thread John Cowan
1) Unicode code units are not 8 bits long; deal with it. Joe How about "1) Unicode characters don't fit in 8 bits; deal with it." "Code units" isn't really in the spirit of JOVUC. -- John Cowan [EMAIL PROTECTED] C'est la` pourtant que se livre le sens du

RE: Unicode FAQ addendum

2000-07-21 Thread Marco . Cimarosti
1) The UTF whose bits can be counted is not the eternal UTF. The encoding that is not in UTR-17 is not a compliant encoding. UCS-2 is the origin of the BMP. UTF-16 is the origin of 1,048,576 more code points. Therefore, constantly use UTF-8 and you'll see the mystery on your mail

Re: Signature for SCSU

2000-07-21 Thread Doug Ewell
David Starner [EMAIL PROTECTED] wrote: So an initial 1B A5 FF is or is not a BOM? That is correct, it is or is not. :-) Unfortunately, despite the recommendation in the TR, you have no guarantee that an initial U+FEFF intended as BOM will be encoded 0E FE FF while an initial U+FEFF intended

Re: Unicode in VFAT file system

2000-07-21 Thread Mark Davis
Unicode has changed and evolved over the years. At this point, UCS-2 is a funny beast, because it shares precisely the same encoding space as UTF-16. That is, in code units there is absolutely no difference between them. The only real difference is whether you interpret the code units in the

Re: Signature for SCSU

2000-07-21 Thread Mark Davis
Because of its usage, ZWNBSP is extremely unlikely at the start of a file, but that doesn't mean it can't occur. A question mark is also extremely unlikely, as are many other characters. However, they can occur. Unicode doesn't forbid any sequence of characters from occurring. Stripping, say,

Re: Unicode FAQ addendum

2000-07-21 Thread John Cowan
Jonathan Rosenne wrote: 2) Byte order is only an issue in I/O. I accept this change. -- Schlingt dreifach einen Kreis um dies! || John Cowan [EMAIL PROTECTED] Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, ||

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and UTF-16LE. The first is with (optionally) a BOM, and the others without. I know this is what the Standard dictates, and I think I understand why, but it doesn't make complete sense to the novice trying to find his/her way: novice

Re: 127 strokes beyond the radical?!

2000-07-21 Thread Rick McGowan
I do not suppose that characters of 128+ strokes are indeed possible, due to the fact that the paper would get quite soggy from the repeated strokes. Well, if they get soggy on little paper just write 'em on bigger paper! In any case, your supposition is not adequately informed. For

Cimarosti's FAQ Tao

2000-07-21 Thread Rick McGowan
Would anyone like to please translate that into Chinese for the benefit of future generations? Rick 1) The UTF whose bits can be counted is not the eternal UTF.

Re: Unicode in VFAT file system

2000-07-21 Thread John Cowan
[EMAIL PROTECTED] wrote: Why does it say there are three varieties when a 16-bit datum can only be serialised in two orders? The simplest way to think about it is to remember that a MIME charset is meant to provide *minimal* information for the receiver to convert bytes into characters. If

Re: What is the difference between i18n and l10n?

2000-07-21 Thread Michael \(michka\) Kaplan
Internationalization (i18N) is the process of bringing your application to the world... having it work properly and appropriately on any locale. Localization (L10N) is the process of bringing the world to your application localizing is translating the application (not just the language but

Re: What is the difference between i18n and l10n?

2000-07-21 Thread Michael \(michka\) Kaplan
i18N is a lot more than that... and often it can stand on its own. It should ideally be done before L10N is (and there are examples of bad results when this was not the case). But I hate it to make it sound like one is just a mere stepping stone to the other. Just my 2 cents. michka -

Re: Cimarosti's FAQ Tao

2000-07-21 Thread Jon Babcock
Would anyone like to please translate that into Chinese for the benefit of future generations? Rick 1) The UTF whose bits can be counted is not the eternal UTF. How about ... UTF ke3 shu3 fei1 chang2 UTF. But this leaves the "bits" out which anyway appears to me to be a commentator's

RE: 127 strokes beyond the radical?!

2000-07-21 Thread Asmus Freytag
At 03:42 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote: Patrick Andries wrote: De : [EMAIL PROTECTED] On page 876, the character U+6B8B is listed as being 127 strokes beyond the radical. I'd say it's more like 6 strokes beyond the radical. I believe it to be 5 strokes and it is already

RE: Unicode in VFAT file system

2000-07-21 Thread Asmus Freytag
At 04:58 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote: If UCS-2LE is a *standard* encoding (and it is in fact mentioned in UTR-17), how does VFAT directories qualify as a "higher level protocol"? My understanding of "higher level protocol" is that it is a *non* standard usage of some kind, allowed

Re: Unicode in VFAT file system

2000-07-21 Thread Asmus Freytag
At 07:14 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote: Why does it say there are three varieties when a 16-bit datum can only be serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it just one of the other two? When it does have a BOM, it can still be serialised in two ways, so

Re: What is the difference between i18n and l10n?

2000-07-21 Thread Peter_Constable
On 07/21/2000 09:30:59 AM [EMAIL PROTECTED] wrote: What is the difference between internationalization and localization? i18n is the process of ensuring that software can be localized. L10n is the process of actually doing so. Software may be internationalized while supporting only

RE: Unicode in VFAT file system

2000-07-21 Thread Becker, Joseph
Jony Rosenne, who has been a great contributor since or before the beginning, wrote in an off moment: UTF-8 is a biased transformation format designed to save American and Western Europeans storage space and to give some people a warm feeling by keeping Unicode in the familiar 8 bit world.

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
On 07/21/2000 12:55:59 PM [EMAIL PROTECTED] wrote: The problem is that the labels where invented to tag data streams, not to 'label' the result of autodetection. As you point out there are 4 results of auto-detection: UTF-16, no BOM UTF-16, no BOM, but arriving in reverse byte order (for my

Re: What is the difference between i18n and l10n?

2000-07-21 Thread Tex Texin
With respect to the capitalization rules, I do not recall ever seeing "i18N" with lowercase i and uppercase N. The only place where I think it might occur is in one of those ransom notes where the case varies throughout the message and the characters are all cut from magazine ads. "i hAvE yOuR

Re: Cimarosti's FAQ Tao

2000-07-21 Thread Kenneth Whistler
Would anyone like to please translate that into Chinese for the benefit of future generations? Rick 1) The UTF whose bits can be counted is not the eternal UTF. Jon Babcock suggested: UTF ke3 shu3 fei1 chang2 UTF. But this leaves the "bits" out which anyway appears to me

Re: What is the difference between i18n and l10n?

2000-07-21 Thread Michael \(michka\) Kaplan
From: [EMAIL PROTECTED] Of John's and Michael's explanations, I have to say this is the better one. (Sorry, Michael.) I won't take it personally. :-) I simply think that the term Internalization covers a related but entirely separate function that does not always lead to localization a

Re: 127 strokes beyond the radical?!

2000-07-21 Thread Patrick Andries
- Message d'origine - De : "Asmus Freytag" [EMAIL PROTECTED] À : "Unicode List" [EMAIL PROTECTED] Cc : [EMAIL PROTECTED] Envoyé : Friday, July 21, 2000 12:10 PM Objet : RE: 127 strokes beyond the radical?! Patrick Andries wrote: De : [EMAIL PROTECTED] On page 876, the character

Re: 127 strokes beyond the radical?!

2000-07-21 Thread Kenneth Whistler
Patrick asked: Patrick Andries wrote: De : [EMAIL PROTECTED] On page 876, the character U+6B8B is listed as being 127 strokes beyond the radical. I'd say it's more like 6 strokes beyond the radical. I believe it to be 5 strokes and it is already listed under radical + 5

RE: (off-topic and rambling) Subset of Unicode to represent

2000-07-21 Thread Edward Cherlin
At 4:31 AM -0800 7/17/2000, [EMAIL PROTECTED] wrote: From: MICHAEL W. MARTIN Actually, we're off making wild assumptions about the nature of Michael's problems with no data to work with... Sorry about that... it was not my intention to keep you in the dark. =) The project I'm working on