Re: Updated UTF-8 decoder stress test file

Paul Sheer Mon, 04 Sep 2000 03:36:44 -0700

thanks

cooledit seems to pass, showing both bad utf sequences
as well as missing glyphs on a different coloured
background. I always show as much info as possible
(it was somewhere recommended to show the same 
glyph for any bad utf sequence) preferring to show
the bad sequence in full, albeit in a different colour.
It is a text editor after all.

-paul

On Sun, 03 Sep 2000 17:12:09 Markus Kuhn wrote:
> I have just updated my UTF-8 decoder stress test file
> 
>   UTF-8-test.txt
> 
> on
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/
> 
> (also part of <http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz>).
> 
> It now contains an additional section 5 with UTF-8 sequences for illegal
> code positions that a good decoder should reject (surrogates, U+FFFE,
> U+FFFF) like overlong and malformed sequences for security reasons, as
> well as all the relevant legal boundary conditions for these.
> 
> I hope you'll find it useful for ironing out the last hidden bugs in
> your UTF-8 decoders. Feel free to adapt the file for and include it into
> your UTF-8 regression test suites.
> 
> Markus
> 
> -- 
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
> 
> -
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/lists/





-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Updated UTF-8 decoder stress test file

Reply via email to