Doug Ewell d...@ewellic.org wrote:
|Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
| Not necessarily true.
|
| [602 words]
|
|This has nothing to do with the scenario I described, which involved
|removing a BOM from the start of an arbitrary fragment of data,
|thereby
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
If you have an arbitrary fragment of data, don't fiddle with it.
Thisis your scenario. The simple concept of a unique start of text
does not exist in live streams that can start anywhere. So you cannot
always expect that U+FEFF or
2014-06-05 0:48 GMT+02:00 Doug Ewell d...@ewellic.org:
If you are processing arbitrary fragments of a stream, without knowledge
of preceding fragments, as in this example, then you have no business
making *any* changes to that fragment based on interpretation of that
fragment as Unicode text.
On Thu, 5 Jun 2014 09:41:07 +0200
Philippe Verdy verd...@wanadoo.fr wrote:
You'll probably want to sync on the first newline control and then
proceed from that point. But now if you have those devices configured
heterogenously and generating their own output encoding you won't
necessarily
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
Not necessarily true.
[602 words]
This has nothing to do with the scenario I described, which involved
removing a BOM from the start of an arbitrary fragment of data,
thereby corrupting the data because the BOM was actually a ZWNBSP.
2014-06-05 21:46 GMT+02:00 Doug Ewell d...@ewellic.org:
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
Not necessarily true.
[602 words]
This has nothing to do with the scenario I described, which involved
removing a BOM from the start of an arbitrary fragment of data,
How common is it to see any of the following in real-world Unicode text,
as opposed to code charts and test suites and the like?
1. Unpaired surrogates
2. Noncharacters (besides CLDR data)
3. U+FEFF at the beginning of a stream (note: not packet or arbitrary
cutoff point)
I'm not asking whether
a text transport or something.
Usually that bites them sooner or later.
-Shawn
-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell
Sent: Wednesday, June 4, 2014 11:01 AM
To: unicode@unicode.org
Subject: Corner cases (was: Re: UTF-16 Encoding
Sorry, I left out an important detail.
I wrote:
3. U+FEFF at the beginning of a stream (note: not packet or
arbitrary cutoff point)
I meant U+FEFF as a zero-width no-break space. Obviously it is very
common to see U+FEFF as a signature or BOM.
My underlying question here is, how common is
On 6/4/2014 11:26 AM, Doug Ewell wrote:
Sorry, I left out an important detail.
I wrote:
3. U+FEFF at the beginning of a stream (note: not packet or
arbitrary cutoff point)
I meant U+FEFF as a zero-width no-break space. Obviously it is very
common to see U+FEFF as a signature or BOM.
My
On Wed, 04 Jun 2014 11:40:11 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 6/4/2014 11:26 AM, Doug Ewell wrote:
I meant U+FEFF as a zero-width no-break space. Obviously it is very
common to see U+FEFF as a signature or BOM.
The semantics of it were chosen at the time to make no sense
On 6/4/2014 12:21 PM, Richard Wordingham wrote:
On Wed, 04 Jun 2014 11:40:11 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 6/4/2014 11:26 AM, Doug Ewell wrote:
I meant U+FEFF as a zero-width no-break space. Obviously it is very
common to see U+FEFF as a signature or BOM.
The semantics
Richard Wordingham richard dot wordingham at ntlworld dot com wrote:
The example that's usually given [of U+FEFF at the start of a stream]
is that of a text file sliced into segments to avoid file size limits.
In these cases, there is the risk that U+FEFF as ZWNBSP will wind up
at the start
13 matches
Mail list logo