Your example does not work, because
Well, I semi-agree. It is possible to write a parser that has no problems. On the other hand, in the real world one meets many parsers that were not well-written, so the security risk exists in the real world.
Any such parser would be unsafe and non-compliant regardless of 4 or 6 byte utf-8 considerations, and thus that point is thoroughly moot.
If you encounter any such parser you must consider it to be severely flawed. (After an initial validating pass it is safe to use degenerate parsers which make assumptions about structure, but not before.)
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
