Hi, > > http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c > > to latest unicode data? > Done.
Now that recently every standard seemed to agree that UTF-8 uses at most 4 (and not 6) bytes and the highest valid Unicode value is U+1FFFFF, I wonder whether the stress test should be updated, too. As far as I understand, the preferred new behavior for a former 5 or 6 byte long UTF-8 sequence is to emit 5 or 6 replacement character, since the first byte is invalid, and subsequent bytes are unexpected continuation bytes. bye, Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
