From: Doug Ewell [EMAIL PROTECTED]
Theodore H. Smith delete at elfdata dot com wrote:
- the file mixes UTF-8 and UTF-16
Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
encoded into UTF-8? Of course a surrogate should never exist in UTF-8.
You are right. Philippe's statement
On Tue, 12 Oct 2004 20:25:16 +0200, Philippe Verdy [EMAIL PROTECTED] wrote:
From: Doug Ewell [EMAIL PROTECTED]
Theodore H. Smith delete at elfdata dot com wrote:
- the file mixes UTF-8 and UTF-16
Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
encoded into UTF-8?
From: Clark Cox [EMAIL PROTECTED]
unless the file was used as a test for CESU-8
The whole point of the CESU-8-like section is that it is not legal UTF-8.
Except that the document does not even cite CESU-8 but only UTF-16! The
text itself is puzzling as well as nearly all its suggestions about
Philippe Verdy schrieb:
Examples of bad assumptions that a reader could make:
- [quote](...) Experience so far suggests
that most first-time authors of UTF-8 decoders find at least one
serious problem in their decoder by using this file.[/quote]
This suggests to the reader that if its browser or
From: Philipp Reichmuth [EMAIL PROTECTED]
Don't you think you are stretching things a bit? This is an UTF-8 parser
stress test file. If an application opens it in a different encoding,
well, of course the results will be different, and things will not look
UTF-8-ish. Again, this is a
From: Terje Bless [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Theodore H. Smith [EMAIL PROTECTED] wrote:
I'd like to see a UTF-8 stress test file.
The top result on Google for the query UTF-8 Stress Test is
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt.
This test
Thanks Phillippe,
in that file, all UTF-8 sequences with 5 bytes or more are invalid
(they are not boundary cases).
Thanks.
So the list of impossible bytes is longer than documented there.
Is it just a case of moving the boundary cases into the impossible
bytes? Or are there impossible bytes
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Theodore H. Smith [EMAIL PROTECTED] wrote:
I'd like to see a UTF-8 stress test file.
The top result on Google for the query UTF-8 Stress Test is
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt.
HTH, HAND. -link
- --
I suggest you
Theodore H. Smith wrote:
I'd like to see a UTF-8 stress test file.
It should consist of lines of UTF-8, separated each by a newline. Each
line should be malformed. Also, some idea of how to deal with the
malformed UTF-8 should be noted in a separate file.
Really, I just want some way to verify
I'd like to see a UTF-8 stress test file.
It should consist of lines of UTF-8, separated each by a newline.
Each line should be malformed. Also, some idea of how to deal with
the malformed UTF-8 should be noted in a separate file.
Really, I just want some way to verify that I can detect every
10 matches
Mail list logo