>>>Markus Kuhn said:
 > 4) I also noted that tclUtf:Tcl_UtfToUniChar accepts overlong UTF-8
 > sequences. This can be a security vulnerability and is forbidden in
 > Unicode 3.1. Practical example: a secure UTF-8 decoder must NOT accept 
any o
     f
 > 
 >   0xc0 0x8A
 >   0xe0 0x80 0x8A
 >   0xf0 0x80 0x80 0x8A
 >   0xf8 0x80 0x80 0x80 0x8A
 >   0xfc 0x80 0x80 0x80 0x80 0x8A
 > 
 > as a valid encoding for U+000a, otherwise this could be used by
 > attackers to bypass ASCII-level integrity checks (e.g. string must me a
 > single line because it contains no 0x0a) before the UTF-8 decoder.

Tcl does its best to accept anything, but produce only shortest-form 
output.  The one special case is embedded nulls (0x0000), where Tcl 
produces 0xC0 0x80 in order to avoid possible null-termination problems 
with non-UTF aware code.  It probably wouldn't break anything to to 
disallow non-shortest form UTF-8 for all but this one case.  If you 
eliminate the 0xc080 case, you'll have to check to make sure *everything* 
is length encoded.

--Scott


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to