> That is not necessarily good advice in security issues.

    What harm can it be? It will not be characters that are relevant in any
    syntactical analyses.

Consider: parser 1 knows that a UTF-8 sequence can have
at most 6 bytes, and sees an illegal 5-byte sequence.

Parser 2 knows that a UTF-8 sequence can have at most
4 bytes, and sees an illegal 4-byte sequence followed by
an ASCII symbol.

Difference in interpretation of a byte sequence always has
security implications.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to