> > When a frame does not end on code point boundary, one needs to
> > remember at most 3 bytes to continue validation on next frame.
> 
> If frames are valid utf-8, then you don't need to keep any state (on either
> end of the connection).

You do need to keep at least the opcode of the first message frame, since
it determines the type of the whole message and the continuation state,
since you need to identify a protocol violation when you receive a
continuation frame when there is nothing to continue or a frame with FIN=1
but opcode != 0 when a continuation was expected.

Thats 4 bits, but you need state.

You also need state even for unfragmented messages whenever the
frame size exceed the amount which can buffered (and a streaming API
is in place).

> > It would make sense that a peer SHOULD fail a connection upon invalid
> > UTF-8 as soon as it is possible - that means with at most 1 frame
> > delay upon the start of the byte sequence that was invalid UTF-8.
> >
> > Anyway: what's the advantage of such an requirement?
> 
> The advantage is frame-wise validation instead of message-wise validation.
> As you point out, it's not a huge distinction, more "be conservative in what
> you send".  It just seems unnecessarily sloppy not to have frame boundaries
> coincide with code point boundaries.

How do you validate a frame of 2^63 octets?

Message-based/frame-based validation does not help. One needs incremental
/streaming validation.

And if you have incremental validation, there is no need to have frame 
boundaries
on whole code points. The incremental validator just keeps it's internal state
(of at most 3 bytes) and you feed it the next chop of octets untils it bails 
"invalid
UTF-8", upon which the connection is immediately failed.

I really don't get the problem ..



_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to