Re: [Gen-art] [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-13

Richard L. Barnes Tue, 06 Sep 2011 08:23:57 -0700

On Sep 6, 2011, at 10:58 AM, Tobias Oberstein wrote:

>> In contrast, *not* requiring breaking at UTF-8 code points means that clients
>> can't do any meaningful validation on text frames.  Which means you might
>> as well get rid of text frames entirely.
> 
> Why?
> 
> You can do streaming validation of UTF-8 without requiring frame boundaries to
> observe UTF-8 code point boundaries.
> 
> In Python you can do that i.e. using
> 
> codecs.getincrementaldecoder('utf-8')()
> 
> When a frame does not end on code point boundary, one needs to remember
> at most 3 bytes to continue validation on next frame.


If frames are valid utf-8, then you don't need to keep any state (on either end 
of the connection).


> It would make sense that a peer SHOULD fail a connection upon invalid UTF-8
> as soon as it is possible - that means with at most 1 frame delay upon the
> start of the byte sequence that was invalid UTF-8.
> 
> Anyway: what's the advantage of such an requirement?

The advantage is frame-wise validation instead of message-wise validation.  As 
you point out, it's not a huge distinction, more "be conservative in what you 
send".  It just seems unnecessarily sloppy not to have frame boundaries 
coincide with code point boundaries.

--Richard
_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Re: [Gen-art] [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-13

Reply via email to