Re: UTF-8N?

2000-06-28 Thread Doug Ewell
Asmus Freytag [EMAIL PROTECTED] wrote: Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not U+FFFE) as a zero-width non-breaking space (despite its formal name). And U+FFEF should *only* be used as a byte order mark and/or signature. (That is already ambiguous and trouble

Re: UTF-8N?

2000-06-26 Thread Asmus Freytag
At 05:29 AM 6/23/00 -0800, [EMAIL PROTECTED] wrote: Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not U+FFFE) as a zero-width non-breaking space (despite its formal name). And U+FFEF should *only* be used as a byte order mark and/or signature. (That is already ambiguous

Re: UTF-8N?

2000-06-23 Thread Peter_Constable
On 06/22/2000 10:54:35 PM [EMAIL PROTECTED] wrote: Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, programs that *expect* UTF-8 instead of SBCS will be able to throw away an initial U+FEFF with even greater confidence. It may even be possible for operating system developers

Re: UTF-8N?

2000-06-23 Thread Peter_Constable
Ken: Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not U+FFFE) as a zero-width non-breaking space (despite its formal name). And U+FFEF should *only* be used as a byte order mark and/or signature. (That is already ambiguous and trouble enough -- without tossing in the

Re: UTF-8N?

2000-06-23 Thread Robert A. Rosenberg
At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote: Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, programs that *expect* UTF-8 instead of SBCS will be able to throw away an initial U+FEFF with even greater confidence. It may even be possible for operating system developers to

Re: UTF-8N?

2000-06-23 Thread John Cowan
"Robert A. Rosenberg" wrote: It would be very UNCool unless the application can tell the operating system that it wants this done for it. Otherwise it will have no way of KNOWING that the edited stream that the operating system is passing it IS UTF-8 (and was so identified by the deleted

Re: UTF-8N?

2000-06-23 Thread Kenneth Whistler
John Cowan wrote: I think the implication is that the OS provides an interface to read characters out of a text file, in which case BOM-eating BOMophagy, aka FEFFagy ;-) (and masking the difference between various text encodings) is very sensible. Historic OSes have not had such an

Re: UTF-8N?

2000-06-22 Thread Antoine Leca
0xFE 0x20 0x00 ... UTF-8N: 0xEF 0xBB 0xBF 0x20 ... UTF-8B: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ... There is something I should have missed. It was my understanding that U+FEFF when received as first character should be seen as BOM and not as a character, and handled accordingly. So I expected

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 03:09:43 PM [EMAIL PROTECTED] wrote: Appropriate or not, users (you know, those people who don't read the documentation that the programmers don't write) will use text editors to split files. They will then concatenate the files using a non-Unicode aware tool. And they will

Re: UTF-8N?

2000-06-22 Thread Christopher John Fynn
[EMAIL PROTECTED] wrote: ... I think the suggestion that BOM and ZWNBSP be de-unified, which I have heard before, may make the best sense. *If* that's the solution, it should be done yesterday. The longer it takes the more implementations (and data) there will be that needs to be changed. -

Re: UTF-8N?

2000-06-22 Thread John Cowan
"Ayers, Mike" wrote: Am I reading this wrong? Here's what I get: I hand you a UTF-16 document. This document is: FE FF 00 48 00 65 00 6C 00 6C 00 6F ..so it says "Hello". Then I say, "Oh, by the way, that's big-endian." *POOF* The content of the document

Re: UTF-8N?

2000-06-22 Thread John Cowan
Kenneth Whistler wrote: Now we are pushing through the long, bureaucratic process of getting this accepted into 10646-1, so it we maintain synchronicity with a joint publication of it as a *standard* character. So a fair statement of what you hope to achieve is: U+2060 will be the zero-width

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 06:33:57 PM [EMAIL PROTECTED] wrote: The standard doesn't ever discuss the BOM in the context of UTF-8, See section 13.6 (page 324). Sure enough. Well, there you go: the confusion is officially sanctioned! Peter Constable

Re: UTF-8N?

2000-06-21 Thread Peter_Constable
On 06/20/2000 08:20:53 PM [EMAIL PROTECTED] wrote: [snip] It may be useful shorthand to define the term "UTF-8N" to refer to UTF-8 text that does not begin with a BOM, and reserve the term "UTF-8" for text that *does* begin with a BOM, "UTF-8" currently does n

Re: UTF-8N?

2000-06-21 Thread Peter_Constable
made about any other character (and would have been equally inaccurate). Now suppose we have a character sequence beginning with U+FEFF U+0020. This would be encoded as follows... An unlikely initial character sequence, and the same objections raised above still apply. Without distinct lab

Re: UTF-8N?

2000-06-21 Thread Juliusz Chroboczek
(I've allowed myself to quote from a number of distinct posts.) DE On the contrary, I thought Peter's point was that the OS (or the DE split/ merge programs) should *not* make any special assumptions DE about text files. Sorry if I wasn't clear. I was taking for granted that OSes will not

Re: UTF-8N?

2000-06-20 Thread Peter_Constable
MDIn XML, this situation does not arise, since it specifies the exact useage of BO M, but it can arise in other circumstances. Another recent thread suggests that the situation with BOM and XML is, in fact, *not* clear. AL I understand there is no way to know whether you SHALL/SHOULD/MAY AL