> In contrast, *not* requiring breaking at UTF-8 code points means that clients
> can't do any meaningful validation on text frames.  Which means you might
> as well get rid of text frames entirely.

Why?

You can do streaming validation of UTF-8 without requiring frame boundaries to
observe UTF-8 code point boundaries.

In Python you can do that i.e. using

codecs.getincrementaldecoder('utf-8')()

When a frame does not end on code point boundary, one needs to remember
at most 3 bytes to continue validation on next frame.

It would make sense that a peer SHOULD fail a connection upon invalid UTF-8
as soon as it is possible - that means with at most 1 frame delay upon the
start of the byte sequence that was invalid UTF-8.

Anyway: what's the advantage of such an requirement?
_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to