Asmus Freytag [EMAIL PROTECTED] wrote:
Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not
U+FFFE) as a zero-width non-breaking space (despite its formal name).
And U+FFEF should *only* be used as a byte order mark and/or
signature. (That is already ambiguous and trouble
At 05:29 AM 6/23/00 -0800, [EMAIL PROTECTED] wrote:
Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not
U+FFFE)
as a zero-width non-breaking space (despite its formal name).
And U+FFEF should *only* be used as a byte order mark and/or signature.
(That
is already ambiguous
On 06/22/2000 10:54:35 PM [EMAIL PROTECTED] wrote:
Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, programs
that
*expect* UTF-8 instead of SBCS will be able to throw away an initial
U+FEFF
with even greater confidence. It may even be possible for operating
system
developers
Ken:
Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not
U+FFFE)
as a zero-width non-breaking space (despite its formal name).
And U+FFEF should *only* be used as a byte order mark and/or signature.
(That
is already ambiguous and trouble enough -- without tossing in the
At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote:
Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP,
programs that *expect* UTF-8 instead of SBCS will be able to throw away
an initial U+FEFF with even greater confidence. It may even be possible
for operating system developers to
"Robert A. Rosenberg" wrote:
It would be very UNCool unless the application can tell the operating
system that it wants this done for it. Otherwise it will have no way of
KNOWING that the edited stream that the operating system is passing it IS
UTF-8 (and was so identified by the deleted
John Cowan wrote:
I think the implication is that the OS provides an interface to read
characters out of a text file, in which case BOM-eating
BOMophagy, aka FEFFagy ;-)
(and masking the
difference between various text encodings) is very sensible. Historic
OSes have not had such an
0xFE 0x20 0x00 ...
UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
UTF-8B: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ...
There is something I should have missed.
It was my understanding that U+FEFF when received as first character should
be seen as BOM and not as a character, and handled accordingly.
So I expected
On 06/21/2000 03:09:43 PM [EMAIL PROTECTED] wrote:
Appropriate or not, users (you know, those people who don't read the
documentation that the programmers don't write) will use text editors to
split
files. They will then concatenate the files using a non-Unicode aware
tool.
And they will
[EMAIL PROTECTED] wrote:
... I think the suggestion that BOM and ZWNBSP be
de-unified, which I have heard before, may make the best sense.
*If* that's the solution, it should be done yesterday. The longer it takes the
more implementations (and data) there will be that needs to be changed.
-
"Ayers, Mike" wrote:
Am I reading this wrong? Here's what I get:
I hand you a UTF-16 document. This document is:
FE FF 00 48 00 65 00 6C 00 6C 00 6F
..so it says "Hello". Then I say, "Oh, by the way, that's
big-endian." *POOF* The content of the document
Kenneth Whistler wrote:
Now we are pushing through the long, bureaucratic process of getting
this accepted into 10646-1, so it we maintain synchronicity with a
joint publication of it as a *standard* character.
So a fair statement of what you hope to achieve is: U+2060 will be
the zero-width
On 06/21/2000 06:33:57 PM [EMAIL PROTECTED] wrote:
The standard doesn't ever discuss the BOM in the context of UTF-8,
See section 13.6 (page 324).
Sure enough. Well, there you go: the confusion is officially sanctioned!
Peter Constable
On 06/20/2000 08:20:53 PM [EMAIL PROTECTED] wrote:
[snip]
It may be useful shorthand to define the term "UTF-8N" to refer to UTF-8
text
that does not begin with a BOM, and reserve the term "UTF-8" for text that
*does* begin with a BOM,
"UTF-8" currently does n
made about any other character (and would have been
equally inaccurate).
Now suppose we have a character sequence beginning with U+FEFF U+0020.
This
would be encoded as follows...
An unlikely initial character sequence, and the same objections raised
above still apply.
Without distinct lab
(I've allowed myself to quote from a number of distinct posts.)
DE On the contrary, I thought Peter's point was that the OS (or the
DE split/ merge programs) should *not* make any special assumptions
DE about text files.
Sorry if I wasn't clear. I was taking for granted that OSes will not
MDIn XML, this situation does not arise, since it specifies the exact
useage of BO M, but it can arise in other circumstances.
Another recent thread suggests that the situation with BOM and XML is, in
fact, *not* clear.
AL I understand there is no way to know whether you SHALL/SHOULD/MAY AL
17 matches
Mail list logo